1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

Slides:

Advertisements

Similar presentations

CS188: Computational Models of Human Behavior

Advertisements

Markov Networks Alan Ritter.

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.

Bayesian Network and Influence Diagram A Guide to Construction And Analysis.

. Exact Inference in Bayesian Networks Lecture 9.

Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.

Graphical Models BRML Chapter 4 1. the zoo of graphical models Markov networks Belief networks Chain graphs (Belief and Markov ) Factor graphs =>they.

Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.

Theory of Computing Lecture 18 MAS 714 Hartmut Klauck.

Exact Inference in Bayes Nets

BAYESIAN NETWORKS CHAPTER#4 Book: Modeling and Reasoning with Bayesian Networks Author : Adnan Darwiche Publisher: CambridgeUniversity Press 2009.

Identifying Conditional Independencies in Bayes Nets Lecture 4.

An Introduction to Variational Methods for Graphical Models.

Introduction of Probabilistic Reasoning and Bayesian Networks

Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.

Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.

From Variable Elimination to Junction Trees

PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.

Global Approximate Inference Eran Segal Weizmann Institute.

Bayesian Network Representation Continued

. Bayesian Networks Lecture 9 Edited from Nir Friedman’s slides by Dan Geiger from Nir Friedman’s slides.

Bayesian Networks Clique tree algorithm Presented by Sergey Vichik.

. Inference I Introduction, Hardness, and Variable Elimination Slides by Nir Friedman.

Exact Inference: Clique Trees

Bayesian Networks Alan Ritter.

PGM 2002/03 Tirgul5 Clique/Junction Tree Inference.

. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.

. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.

Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.

Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.

Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?

Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.

Automated Planning and Decision Making Prof. Ronen Brafman Automated Planning and Decision Making 2007 Bayesian networks Variable Elimination Based on.

Introduction to Bayesian Networks

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,

UIUC CS 598: Section EA Graphical Models Deepak Ramachandran Fall 2004 (Based on slides by Eyal Amir (which were based on slides by Lise Getoor and Alvaro.

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.

1 Bayesian Networks (Directed Acyclic Graphical Models) The situation of a bell that rings whenever the outcome of two coins are equal can not be well.

Probabilistic Graphical Models seminar 15/16 ( ) Haim Kaplan Tel Aviv University.

1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,

Pattern Recognition and Machine Learning

Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.

Introduction on Graphic Models

1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.

Today Graphical Models Representing conditional dependence graphically

1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.

. Bayesian Networks Some slides have been edited from Nir Friedman’s lectures which is available at Changes made by Dan Geiger.

The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.

. Introduction to Bayesian Networks Instructor: Dan Geiger Web page:

CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.

Knowledge Representation & Reasoning Lecture #5 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro.

1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:

Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk

CS 2750: Machine Learning Directed Graphical Models

Approximate Inference

Bayesian Networks Background Readings: An Introduction to Bayesian Networks, Finn Jensen, UCL Press, Some slides have been edited from Nir Friedman’s.

Bell & Coins Example Coin1 Bell Coin2

The set  of all independence statements defined by (3

Bayesian Networks (Directed Acyclic Graphical Models)

Dependency Models – abstraction of Probability distributions

Bayesian Networks Based on

Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.

Inference III: Approximate Inference

Presentation transcript:

1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable X is conditionally independent of all its non-descendants, given its parents implies using decomposition that it is also independent of its predecessors in a particular order d. Proof  : X is d-separated of all its non-descendants, given its parents. Since D is an I-map, by the soundness theorem the claim holds.

2 COROLLARY 5: If D=(U,E) is a boundary DAG of P constructed in some order d, then any topological order d’ of U will yield the same boundary DAG of P. (Hence construction order can be forgotten). Proof : Each variable X is d-separated of all its non- descendants, given its parents in the boundary DAG of P. In particular, due to decomposition, X is independent given its parents from all previous variables in any topological order d’.

3 Extension of the Markov Chain Property I(X k, X k-1, X 1 … X k-2 )  I(X k, X k-1 X k+1, X 1 … X k-2 X k+2 … X n ) Holds due to the soundness theorem. Converse holds when Intersection is assumed. Markov Blankets in DAGs

4 Consequence: There is no improvement to d-separation and no statement escapes graphical representation. Reasoning: (1) If there were an independence statement  not shown by d-separation, then  must be true in all distributions that satisfy the basis. But Theorem 10 states that there exists a distribution that satisfies the basis and its consequences but violates . (2) Same argument. [Note that (2) is a stronger claim.]

. Bayesian Networks Some slides have been edited from Nir Friedman’s lectures which is available at Changes made by Dan Geiger. Background Readings: An Introduction to Bayesian Networks, Finn Jensen, UCL Press, 1997.

6 The “Visit-to-Asia” Example Visit to Asia Smoking Lung Cancer Tuberculosis Abnormality in Chest Bronchitis X-Ray Dyspnea What are the relevant variables and their dependencies ?

7 Verifying the (in)Dependencies u We can now ask the expert: do the following assertion hold? l I ( S; V ) I ( T; S | V ) I ( l; {T, V} | S ) … I ( X; { V,S,T,L,B,D} | A) V S L T A B XD Alternative verification: Is each variable becoming independent of the rest, given its Markov boundary ? Take-Home Question: Are other variable construction orders as good ?

8 Quantifying the Bayesian Network p(t|v) Bayesian network = Directed Acyclic Graph (DAG), annotated with conditional probability distributions. V S L T A B XD p(x|a) p(d|a,b) p(a|t,l) p(b|s) p(l|s) p(s)p(s) p(v)p(v)

9 Local distributions Conditional Probability Table: p(A=y|L=n, T=n) = 0.02 p(A=y|L=n, T=y) = 0.60 p(A=y|L=y, T=n) = 0.99 p(A=y|L=y, T=y) = 0.99 L (Yes/No) T (Yes/No) A (Yes/no) p(A|T,L) Asymmetric independence in the CPT

10 Queries There are several types of queries. Most queries involve evidence An evidence e is an assignment of values to a set E of variables in the domain Example, A Posteriori belief: P(D=yes | V = yes ) Or in general: P(H=h | E = e ) where H and E are subsets of variables. Equivalent to computing P(H=h, E = e ) and then dividing.

11 A posteriori belief This query is useful in many cases: u Prediction: what is the probability of an outcome given the starting condition u Diagnosis: what is the probability of disease/fault given symptoms V S L T A B XD

12 Example: Predictive+Diagnostic P(T = Yes | Visit_to_Asia = Yes, Dyspnea = Yes ) V S L T A B XD Probabilistic inference can combine evidence form all parts of the network, Diagnostic and Predictive, regardless of the directions of edges in the model.

13 Queries: MAP  Find the maximum a posteriori assignment for some variable of interest (say H 1,…,H l )  That is, h 1,…,h l maximize the conditional probability P(h 1,…,h l | e)  Equivalent to maximizing the joint P(h 1,…,h l, e)

14 Queries: MAP We can use MAP for: u Explanation l What is the most likely joint event, given the evidence (e.g., a set of likely diseases given the symptoms) l What is the most likely scenario, given the evidence (e.g., a series of likely malfunctions that trigger a fault). D1 D2 S2 S1 D3 D4 S4 S3 Dead battery Not charging Bad battery Bad magneto Bad alternator

15 How Expressive are Bayesian Networks 1. Check the diamond example via all boundary bases. 2.The following property holds for d-separation but does not hold for conditional independence: I D (X,{},Y) and I D (X, ,Y)  I D (X,{},  ) or I D ( ,{},Y)

16 Drawback: Interpreting the Links is not simple Another drawback is the difficulty with extreme probabilities. There is no local test for I-mapness. Both drawbacks disappear in the class of decomposable models, which are a special case of Bayesian networks

17 Decomposable Models Example: Markov Chains and Markov Trees Assume the following chain is an I-map of some P(x 1,x 2,x 3,x 4 ) and was constructed using the methods we just described. The “compatibility functions” on all links can be easily interpreted in the case of chains. Same also for trees. This idea actually works for all chordal graphs.

18 Chordal Graphs

19 Interpretation of the links Clique 1 Clique 2 Clique 3 A probability distribution that can be written as a product of low order marginals divided by a product of low order marginals is said to be decomposable.

20 Importance of Decomposability When assigning compatibility functions it suffices to use marginal probabilities on cliques and just make sure to be locally consistent. Marginals can be assessed from experts or estimated directly from data.

21 The Diamond Example – The smallest non chordal graph Adding one more link will turn the graph to become chordal. Turning a general undirected graph into a chordal graph in some optimal way is the key for all exact computations done on Markov and Bayesian networks.

22

23 Complexity of Inference Theorem: Computing P(X = x) in a Bayesian network is NP- hard. Main idea: conditional probability tables with zeros and ones are equivalent to logical gates. Hence reducibility to 3-SAT is the easiest to pursue.

24 Proof We reduce 3-SAT to Bayesian network computation Assume we are given a 3-SAT problem:  Q 1,…,Q n be propositions,   1,...,  k be clauses, such that  i = l i1  l i2  l i3 where each l ij is a literal over Q 1,…,Q n (e.g., Q 1 = true ) u  =  1 ...  k We will construct a Bayesian network s.t. P(X=t) > 0 iff  is satisfiable

25  P(Q i = true) = 0.5,  P(  I = true | Q i, Q j, Q l ) = 1 iff Q i, Q j, Q l satisfy the clause  I  A 1, A 2, …, are simple binary AND gates... 11 Q1Q1 Q3Q3 Q2Q2 Q4Q4 QnQn 22 33 kk A1A1  k-1 A2A2 X A k-2

26 u It is easy to check l Polynomial number of variables l Each Conditional Probability Table can be described by a small table (8 parameters at most) P(X = true) > 0 if and only if there exists a satisfying assignment to Q 1,…,Q n u Conclusion: polynomial reduction of 3-SAT... 11 Q1Q1 Q3Q3 Q2Q2 Q4Q4 QnQn 22 33 kk A1A1  k-1 A2A2 X A k-2

27 Inference is even #P-hard  P(X = t) is the fraction of satisfying assignments to   Hence 2 n P(X = t) is the number of satisfying assignments to   Thus, if we know to compute P(X = t), we know to count the number of satisfying assignments to . u Consequently, computing P(X = t) is #P-hard.

28 Hardness - Notes u We need not use deterministic relations in our construction.  The construction shows that hardness follows even with a small degree graphs. u Hardness does not mean we cannot do inference l It implies that we cannot find a general procedure that works efficiently for all networks l For particular families of networks, we can have provably efficient procedures (e.g., trees, HMMs). l Variable elimination algorithms.

29 Extra Slides with more details If times allows

30 Chordal Graphs

31 Example of the Theorem 1.Each cycle has a chord. 2.There is a way to direct edges legally, namely, A  B, A  C, B  C, B  D, C  D, C  E 3.Legal removal order (eg): start with E, than D, than the rest. 4.The maximal cliques form a join (clique) tree.

32 Theorem X: Every undirected graph G has a distribution P such that G is a perfect map of P.

33 Proof of Theorem X Given a graph G, it is sufficient to show that for an independence statement  = I( ,Z,  ) that does NOT hold in G, there exists a probability distribution that satisfies all independence statements that hold in the graph and does not satisfy  = I( ,Z,  ). Well, simply pick a path in G between  and  that does not contain a node from Z. Define a probability distribution that is a perfect map of the chain and multiply it by any marginal probabilities on all other nodes forming P . Now “multiply” all P  (Armstrong relation) to obtain P. Interesting task (Replacing HMW #4): Given an undirected graph over binary variables construct a perfect map probability distribution. (Note: most Markov random fields are perfect maps !).

34 Interesting conclusion of Theorem X: All independence statements that follow for strictly-positive probability from the neighborhood basis are derivable via symmetry, decomposition, intersection, and weak union. These axioms are (sound and) complete for neighborhood bases. These axioms are (sound and) complete also for pairwise bases. In fact for saturated statements conditional independence and separation have the same characterization. See paper P2.