Axioms and Algorithms for Inferences Involving Probabilistic Independence Dan Geiger, Azaria Paz, and Judea Pearl, Information and Computation 91(1), March.

Slides:



Advertisements
Similar presentations
Some important properties Lectures of Prof. Doron Peled, Bar Ilan University.
Advertisements

Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.
WSPD Applications.
MATH 224 – Discrete Mathematics
Functional Verification III Prepared by Stephen M. Thebaut, Ph.D. University of Florida Software Testing and Verification Lecture Notes 23.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
MS 101: Algorithms Instructor Neelima Gupta
Sub Exponential Randomize Algorithm for Linear Programming Paper by: Bernd Gärtner and Emo Welzl Presentation by : Oz Lavee.
Greedy Algorithms Greed is good. (Some of the time)
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
Having Proofs for Incorrectness
Agrawal-Kayal-Saxena Presented by: Xiaosi Zhou
Primality Testing Patrick Lee 12 July 2003 (updated on 13 July 2003)
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Tirgul 8 Graph algorithms: Strongly connected components.
Formal Logic Proof Methods Direct Proof / Natural Deduction Conditional Proof (Implication Introduction) Reductio ad Absurdum Resolution Refutation.
1 Introduction to Computability Theory Lecture15: Reductions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Ch. 7 - QuickSort Quick but not Guaranteed. Ch.7 - QuickSort Another Divide-and-Conquer sorting algorithm… As it turns out, MERGESORT and HEAPSORT, although.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Induction Sections 4.1 and 4.2 of Rosen Fall 2010
Restricted Satisfiability (SAT) Problem
Submitted by : Estrella Eisenberg Yair Kaufman Ohad Lipsky Riva Gonen Shalom.
Complexity 19-1 Complexity Andrei Bulatov More Probabilistic Algorithms.
1 Shortest Path Algorithms. 2 Routing Algorithms Shortest path routing What is a shortest path? –Minimum number of hops? –Minimum distance? There is a.
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
The Polynomial Hierarchy By Moti Meir And Yitzhak Sapir Based on notes from lectures by Oded Goldreich taken by Ronen Mizrahi, and lectures by Ely Porat.
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
Ofer Strichman, Technion Deciding Combined Theories.
Lecture , 3.1 Methods of Proof. Last time in 1.5 To prove theorems we use rules of inference such as: p, p  q, therefore, q NOT q, p  q, therefore.
Induction and recursion
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
Chapter 4 Context-Free Languages Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
The Theory of NP-Completeness 1. What is NP-completeness? Consider the circuit satisfiability problem Difficult to answer the decision problem in polynomial.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Introduction to Proofs
Reading and Writing Mathematical Proofs
Mathematics Review Exponents Logarithms Series Modular arithmetic Proofs.
MATH 224 – Discrete Mathematics
DECIDABILITY OF PRESBURGER ARITHMETIC USING FINITE AUTOMATA Presented by : Shubha Jain Reference : Paper by Alexandre Boudet and Hubert Comon.
1 7. Two Random Variables In many experiments, the observations are expressible not as a single quantity, but as a family of quantities. For example to.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Sets.
Week 15 - Wednesday.  What did we talk about last time?  Review first third of course.
CS 173, Lecture B August 27, 2015 Tandy Warnow. Proofs You want to prove that some statement A is true. You can try to prove it directly, or you can prove.
Advanced Topics in Propositional Logic Chapter 17 Language, Proof and Logic.
Notes on assembly verification in the aTAM Days 22, 24 and 25 of Comp Sci 480.
Reading and Writing Mathematical Proofs Spring 2015 Lecture 4: Beyond Basic Induction.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Two Random Variables.
A Logic of Partially Satisfied Constraints Nic Wilson Cork Constraint Computation Centre Computer Science, UCC.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
Linear Program Set Cover. Given a universe U of n elements, a collection of subsets of U, S = {S 1,…, S k }, and a cost function c: S → Q +. Find a minimum.
CS 103 Discrete Structures Lecture 13 Induction and Recursion (1)
Mathematical Induction Section 5.1. Climbing an Infinite Ladder Suppose we have an infinite ladder: 1.We can reach the first rung of the ladder. 2.If.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Daniel Kroening and Ofer Strichman Decision Procedures An Algorithmic Point of View Deciding Combined Theories.
CompSci 102 Discrete Math for Computer Science March 13, 2012 Prof. Rodger Slides modified from Rosen.
Linear Programming Back to Cone  Motivation: From the proof of Affine Minkowski, we can see that if we know generators of a polyhedral cone, they.
Advanced Algorithms Analysis and Design
Chapter 15 Running Time Analysis. Topics Orders of Magnitude and Big-Oh Notation Running Time Analysis of Algorithms –Counting Statements –Evaluating.
1 Design and Analysis of Algorithms Yoram Moses Lecture 13 June 17, 2010
TIRGUL 10 Dijkstra’s algorithm Bellman-Ford Algorithm 1.
PROBABILITY AND COMPUTING RANDOMIZED ALGORITHMS AND PROBABILISTIC ANALYSIS CHAPTER 1 IWAMA and ITO Lab. M1 Sakaidani Hikaru 1.
Distinct Distances in the Plane
Algorithm design and Analysis
Enumerating Distances Using Spanners of Bounded Degree
Hidden Markov Models Part 2: Algorithms
The Foundations: Logic and Proofs
Functional Verification II
Presentation transcript:

Axioms and Algorithms for Inferences Involving Probabilistic Independence Dan Geiger, Azaria Paz, and Judea Pearl, Information and Computation 91(1), March 1991, Presentation by Guy Moses & Omer Weissbrod for the course Bayesian Networks Computer Science Faculty, Technion – winter 2009 partially based on the presentation by Ilan Gronau

2 What’s ahead?  Introduction  Introduction - some definitions, notations and reminders.  Proof of Completeness  Proof of Completeness. - “if it’s true – it can be proved”.  Preparations for the Membership Algorithm  Preparations for the Membership Algorithm – more definitions, and some theoretical groundwork.  The Membership Algorithm  The Membership Algorithm – description, proof of correctness, complexity analysis.

3 Definitions U (Universe) – set of random variables with probability distribution P. X,Y – finite sets of random variables: X =  x 1,…,x n , Y =  y 1,…,y m  P(X,Y) = P(X)·P(Y) - a short-hand notation for the equality: Pr{x 1 =a 1,…, x n =a n, y 1 =b 1, …, y m =b m } = Pr{x 1 =a 1,…, x n =a n } · Pr{y 1 =b 1, …, y m =b m } for every choice of a 1, …, a n, b 1, …, b m (X,Y) – short-hand for P(X,Y) = P(X)·P(Y) This is called an independence statement. *note that X, Y are disjoint sets of variables (X  Y =  ).

4 Notations  - a specific independence statement of the form (X,Y)  - a set of independence statements of the form (X,Y):  =  1, …,  k  XY - short-hand notation for the union X  Y P satisfies  = (X,Y) means: P(X,Y) = P(X)·P(Y) for that specific P.

5 Soundness and Completeness Definitions:    iff every distribution that satisfies  also satisfies .    iff   cl(  ), i.e. there exists a derivation chain  1,…,  n =  s.t. for each  j, either  j   or  j is derived by an axiom from the previous statements. For a set of axioms A : Soundness: A is sound iff for every  and  :        Completeness: A is complete iff for every  and  :        Completeness - Alternative definition: A is complete iff for every  and every   cl(  ) there exists a distribution P  that satisfies cl)  ( and does not satisfy .

6 Independence Axioms We saw (in 1st lecture) that axioms 1a-1d are sound (always infer correctly). Today we’ll show they are complete (can derive every true statement).

7 What’s ahead?  Introduction  Introduction - some definitions, notations and reminders.  Proof of Completeness  Proof of Completeness. - “if it’s true – it can be proved”.  Preparations for the Membership Algorithm  Preparations for the Membership Algorithm – more definitions, and some theoretical groundwork.  The Membership Algorithm  The Membership Algorithm – description, proof of correctness, complexity analysis.

8 Minimal Statement Definition:  =(X,Y)  cl(  ) is minimal if for every non-empty X’,Y’ s.t. X’  X, Y’  Y, X’Y’  XY we have (X’,Y’)  cl(  ). For every  =(X,Y)  cl(  ) we can find an appropriate minimal  ’=(X’,Y’)  cl(  ) through iterative decomposition. Observation: P  satisfies   P  satisfies  ’ (decomposition soundness), Therefore: P  doesn’t satisfy  ’  P  doesn’t satisfy . Our plan: Given an arbitrary   cl(  ), We will find a distribution P  that satisfies cl(  ) but doesn’t satisfy  ’. This will prove completeness (using the alternative completeness definition and the observation above). To simplify annotation, we will assume WLOG that  =(X,Y) is already minimal.

9 Let  =(X,Y)  cl(  ) be a minimal statement where: X={x 1,…,x n }, Y={y 1,…,y m }, and Z={z 1,z 2,…,z k } stand for the rest of the variables in U. We will construct P  as follows: All variables, except x 1, are fair coins (probability  for each of their two values) x 1 is defined thus: Part 1: P  does not satisfy  We will inspect the following scenario: x 1 =1, all other variables are 0. P  (x 1, …, x n, y 1, …, y m )  P  (x 1, …, x n )·P  (y 1, …, y m ) Therefore, P  does not satisfy , as required. Completeness Proof =0=0.5 n =0.5 m

10 Completeness Proof – cont’d Part 2: P  satisfies cl(  ) Let (V,W)  cl(  ). We will show that P  (V,W)=P  (V)·P  (W). This is done by inspecting different scenarios: Scenario 1: either V or W contains only elements of Z. We will assume WLOG that W contains only elements of Z. all variables in Z are independent under P  and therefore: W V X Z Y Y Y Y Y Y X X X Z Z Z Z Z Z Z Z Z Z Z Z Z

11 Completeness Proof – cont’d Part 2: P  satisfies cl(  ) Let (V,W)  cl(  ). We will show that P  (V,W)=P  (V)·P  (W). This is done by inspecting different scenarios: Scenario 2: Both V and W contain elements of X  Y, but V  W doesn’t contain all elements of X  Y. Without full information about the assignments of the variables in X  Y, x 1 could turn out to be 0 or 1 with probability , and therefore: W V X Z Y Y Y Y Y Y X X X Z Z Z Z Z Z Z Z Z Z Z Z Z

12 W X Z Y Y Y Y Y Y X X X Z Z Z Z Z Z Z Z Z Z Z Z Z V Completeness Proof – cont’d Part 2: P  satisfies cl(  ) - continued Scenario 3: Both V and W contain elements of X  Y, and (X  Y)  (V  W). We will show a derivation chain for  =(X,Y), contradicting our original assumption that   cl(  ) : Mark: (V,W)=(X V Y V Z V, X W Y W Z W )  cl(  ) where: Y=Y V Y W, X=X V X W, Z V Z W  Z, V=X V Y V Z V, W=X W Y W Z W Remove all z ’s by decomposition: (X V Y V, X W Y W )  cl(  ) Due to minimality of  =(X,Y) : (X V,Y V )  cl(  ) and (X W,Y)  cl(  ) (X V,Y V )  (X V Y V, X W Y W ) (X V,Y V X W Y W ) = (X V,X W Y) (X W,Y)  (X W Y, X V ) (Y, X V X W ) = (Y,X) =  mix 

13 Completeness Proof – Summary Reminder: Completeness - Alternative definition: A is complete iff for every  and every   cl(  ) there exists a distribution P  that satisfies cl)  ( and does not satisfy . We’ve shown: given a minimal   cl(  ), there exists a distribution P  that obeys: 1. P  does not satisfy . 2. P  satisfies . Given a non-minimal   cl(  ), we will derive its minimal statement  ’, and devise a distribution P  ’ that satisfies  but does not satisfy  ’. Due to soundness of decomposition, P  ’ cannot satisfy  as well.

14 Scope of Completeness The proof uses P   - a binary p.d. (probability distribution function) therefore: all p.d.’s over U discrete p.d.’s binary p.d.’s normal p.d.’s however, for normal p.d.’s, the axiom set a1-d1 is not complete. a stronger axiom is required: replace: with: P 

15 What’s ahead?  Introduction  Introduction - some definitions, notations and reminders.  Proof of Completeness  Proof of Completeness. - “if it’s true – it can be proved”.  Preparations for the Membership Algorithm  Preparations for the Membership Algorithm – more definitions, and some theoretical groundwork.  The Membership Algorithm  The Membership Algorithm – description, proof of correctness, complexity analysis.

16 Some more Definitions and Tools Definition: Span span(  ) : the set of elements represented in statement . Example: span(x 1 x 2,x 3, x 4 ) = {x 1,x 2,x 3, x 4 } span(  ) : the set of elements represented in all statements of . Example: span({(x 1,x 2 ),(x 1,x 3 )}) = {x 1,x 2,x 3 }

17 Some more Definitions and Tools Definition: Projection The projection of  on X, denoted  (X), is the statement derived from  by removing all elements not in X from . Example: if  =(x 1 x 2 x 3, x 4 x 5 ) and X={x 2,x 3,x 4 } then  (X)=(x 2 x 3, x 4 ). The projection of  on X, denoted  (X), is {  (X) |    }.

18 Some more Definitions and Tools Projection Lemma:    iff  ‘  , where  ’ =  (span(  ))  ) if  '   then clearly    because all the statements in  ‘ can be derived from the statements in  by decomposition.

19 Some more Definitions and Tools Projection Lemma:    iff  ’  , where  ’ =  (span(  )), s = span(  )  ) if    then there is a derivation chain for  :  1,  2, …,  k. For each  j : if  k   j, k<j, (by symmetry or decomposition) then  k (s)   j (s) by symmetry or decomposition respectively. Similarly, if  j is derived from  k and  l by mixing, then  j (s) is derived from  k (s),  l (s) by mixing. 

20 Some more Definitions and Tools Projection Lemma:    iff  ’  , where  ’ =  (span(  )), s = span(  ) Observations from projection lemma: Variables not in  are unnecessary for determining whether    . The problem of verifying whether     can be simplified to the problem of verifying whether  '  , where  '=  (span(  )). This problem can be solved with a possibly reduced time and space complexity.

21 Conditions for Inference of Independence Maim claim: for a given ,  we have  ’   iff: 1.  is trivial:  = (X,  ) (up to symmetry) OR 2.  is in  ’:  ’ (up to symmetry) OR 3.  is derivable from  ’: there exists  ’  ’ s.t. span(  ) = span(  ’) and for  ’ = (AP,BQ)  = (AQ,BP) (A,B,Q,P may be empty)  ’  (A,P),  ’  (B,Q) (up to symmetry)

22 Proof of Main Claim Maim claim: for a given ,  we have  ’   iff: 1.  is trivial*:  = (X,  ) *up to symmetry 2.  is in*  ’:  ’ 3.  is derivable* from  ’:   ’  ’ s.t. span(  ) = span(  ’) and for  ’=(AP,BQ)  =(AQ,BP) :  ’  (A,P),  ’  (B,Q)  ) if 1.  is trivial* OR 2.  is in*  ’. than the proof is immediate. otherwise, 3. there exists  ’  ’ s.t. span(  ) = span(  ’) and for  ’=(AP,BQ)  =(AQ,BP) :  ’  (A,P),  ’  (B,Q) we will show a constructive proof under these conditions

23 Proof of Main Claim Maim claim: for a given ,  we have  ’   iff: 1.  is trivial*:  = (X,  ) *up to symmetry 2.  is in*  ’:  ’ 3.  is derivable* from  ’:   ’  ’ s.t. span(  ) = span(  ’) and for  ’=(AP,BQ)  =(AQ,BP) :  ’  (A,P),  ’  (B,Q)  ) (contd.) given that  ’  (AP,BQ),  ’  (A,P),  ’  (B,Q). 1. (A,P)  (AP,BQ) (A,PBQ) 2. (B,Q)  (AP,BQ) (APB,Q) (PB,Q) 3. (PB,Q)  (A,PBQ) (AQ,PB) = (AQ, BP) =  We’ve proven this direction. mix dec.

24 Proof of Main Claim Maim claim: for a given ,  we have  ’   iff: 1.  is trivial*:  = (X,  ) *up to symmetry 2.  is in*  ’:  ’ 3.  is derivable* from  ’:   ’  ’ s.t. span(  ) = span(  ’) and for  ’=(AP,BQ)  =(AQ,BP) :  ’  (A,P),  ’  (B,Q)  ) Given  ’  , if 1.  is trivial* OR 2.  is in*  ’, than the proof is immediate. Otherwise, since no axiom can add new variables to a statement, there must exist  ’  ’ s.t. span(  ) = span(  ’) in the derivation chain of . also:  = (AQ,BP) (A,P)  = (AQ,BP) (Q,B)  dec.

25 Conclusions from Claim We’ve seen that, after discarding unneeded variables, it is possible to tell whether  ’   (when it’s not immediately obvious) by: a.Finding another statement  ’  ’ for which span(  ) = span(  ’), b.Verifying that  ’  (A,P),  ’  (B,Q) when  ’=(AP,BQ)  =(AQ,BP). This suggests using a recursive “divide and conquer” approach.

26 What’s ahead?  Introduction  Introduction - some definitions, notations and reminders.  Proof of Completeness  Proof of Completeness. - “if it’s true – it can be proved”.  Preparations for the Membership Algorithm  Preparations for the Membership Algorithm – more definitions, and some theoretical groundwork.  The Membership Algorithm  The Membership Algorithm – description, proof of correctness, complexity analysis.

27 The Membership Algorithm Procedure Find( ,  ): 1.set  ’ :=  (span(  )). 2.if  is trivial, or  ’ (up to symmetry) then Find( ,  ) := TRUE. 3.elseif for all non-trivial  ’  ’ : span(  )  span(  ’), then Find( ,  ) := FALSE. 4.else there exists  ’  ’ : span(  ) = span(  ’), and  ’=(AP,BQ)  =(AQ,BP), set  1 := (A,P),  2 := (B,Q). Find( ,  ) := ( Find(  ’,  1 )  Find(  ’,  2 ) )

28 Algorithm Correctness Proof We will prove that Find( ,  ) := TRUE   cl(  ) by induction on k= . Induction base: if k=1 then  is trivial, therefore the algorithm will return TRUE in step 2 and  cl(  ).

29 Algorithm Correctness Proof Induction assumption: Find( ,  ) := TRUE   cl(  ) for each  ’  <k. Induction step: Find( ,  ) := TRUE iff either: 1. Step 2 returns TRUE   is trivial or  ’   cl(  ). 2. Step 4 returns TRUE iff Find(  ’,  1 ) := TRUE  Find(  ’,  2 ) := TRUE iff  1  cl(  ’ )   2  cl(  ’ ) iff  cl(  ’ ) (according to algorithm’s definition) (according to induction assumption) (according to main claim) (according to projection lemma) iff  cl(  ) 

30 Complexity Analysis Definitions: n = the number of distinct variables in   {  }. k = the number of distinct variables in {  }. First projection cost: O (|  |·n) – happens only once. Recursive step: T)k)  |  |·k + T(k 1 ) + T(k 2 ) where k 1 +k 2 =k, k 1 =|  1 |, k 2 =|  2 | Can be shown by induction: T)k)  |  |·k·( depth of recursion) Worst case analysis: T)k)  |  |·k·k= |  |·k 2 Total run time is bounded by: O (|  |·n + |  |·k 2 ) which is also: O (|  |·n 2 ) since k  n.

31 Improvements and Variations Instead of arbitrarily choosing  ’, find one whose sub- statements { A, B, P, Q } have balanced size (can improve run-time complexity). Using the derivation chain presented in the constructive proof, the algorithm can also return a derivation chain for  with a length of O (k).

32 Variations (contd.) The algorithm can be expanded into a polynomial algorithm for the following problems: Given two sets  and , is cl(  )  cl(  ) ? is cl(  ) = cl(  ) ? Minimize the size of  while preserving cl(  ) : Start with a maximal-size statement and remove from  all statements derivable from it. Repeat with the next largest statement etc.

Thank you!