1 Undirected Graphical Models Graphical Models – 10708 Carlos Guestrin Carnegie Mellon University October 29 th, 2008 Readings: K&F: 4.1, 4.2, 4.3, 4.4,

Slides:



Advertisements
Similar presentations
Mathematical Preliminaries
Advertisements

Constraint Satisfaction Problems
Chapter 1 The Study of Body Function Image PowerPoint
Mean-Field Theory and Its Applications In Computer Vision1 1.
Solve Multi-step Equations
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Discrete time Markov Chain
ABC Technology Project
5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.
VOORBLAD.
Direct-Current Circuits
Constant, Linear and Non-Linear Constant, Linear and Non-Linear
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.
© 2012 National Heart Foundation of Australia. Slide 2.
Maciej Stasiak, Mariusz Głąbowski Arkadiusz Wiśniewski, Piotr Zwierzykowski Models of Links Carrying Multi-Service Traffic Chapter 7 Modeling and Dimensioning.
Introduction to Feedback Systems / Önder YÜKSEL Bode plots 1 Frequency response:
Copyright © 2013, 2009, 2006 Pearson Education, Inc. 1 Section 5.4 Polynomials in Several Variables Copyright © 2013, 2009, 2006 Pearson Education, Inc.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
25 seconds left…...
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
PSSA Preparation.
The Small World Phenomenon: An Algorithmic Perspective Speaker: Bradford Greening, Jr. Rutgers University – Camden.
Markov Networks Alan Ritter.
Compiler Construction
Variational Inference Amr Ahmed Nov. 6 th Outline Approximate Inference Variational inference formulation – Mean Field Examples – Structured VI.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
Graphical Models BRML Chapter 4 1. the zoo of graphical models Markov networks Belief networks Chain graphs (Belief and Markov ) Factor graphs =>they.
Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.
Introduction to Markov Random Fields and Graph Cuts Simon Prince
Exact Inference in Bayes Nets
Parameter Learning in MN. Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited.
Clique Trees Amr Ahmed October 23, Outline Clique Trees Representation Factorization Inference Relation with VE.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
Parameter Learning in Markov Nets Dhruv Batra, Recitation 11/13/2008.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
1 BN Semantics 2 – The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 20 th, 2006 Readings: K&F:
1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Markov Random Fields in Vision
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.
1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Markov Networks.
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Unifying Variational and GBP Learning Parameters of MNs EM for BNs
BN Semantics 3 – Now it’s personal! Parameter Learning 1
Junction Trees 3 Undirected Graphical Models
Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website
Markov Networks.
Variable Elimination Graphical Models – Carlos Guestrin
Mean Field and Variational Methods Loopy Belief Propagation
Generalized Belief Propagation
BN Semantics 2 – The revenge of d-separation
Presentation transcript:

1 Undirected Graphical Models Graphical Models – Carlos Guestrin Carnegie Mellon University October 29 th, 2008 Readings: K&F: 4.1, 4.2, 4.3, 4.4, –  Carlos Guestrin

2 Normalization for computing probabilities To compute actual probabilities, must compute normalization constant (also called partition function) Computing partition function is hard! ! Must sum over all possible assignments

–  Carlos Guestrin Factorization in Markov networks Given an undirected graph H over variables X={X 1,...,X n } A distribution P factorizes over H if 9  subsets of variables D 1  X,…, D m  X, such that the D i are fully connected in H  non-negative potentials (or factors)   (D 1 ),…,  m (D m ) also known as clique potentials  such that Also called Markov random field H, or Gibbs distribution over H

–  Carlos Guestrin Global Markov assumption in Markov networks A path X 1 – … – X k is active when set of variables Z are observed if none of X i 2 {X 1,…,X k } are observed (are part of Z) Variables X are separated from Y given Z in graph H, sep H (X;Y|Z), if there is no active path between any X2X and any Y2Y given Z The global Markov assumption for a Markov network H is

–  Carlos Guestrin The BN Representation Theorem Joint probability distribution: Obtain If conditional independencies in BN are subset of conditional independencies in P Important because: Independencies are sufficient to obtain BN structure G If joint probability distribution: Obtain Then conditional independencies in BN are subset of conditional independencies in P Important because: Read independencies of P from BN structure G

–  Carlos Guestrin Markov networks representation Theorem 1 If you can write distribution as a normalized product of factors ) Can read independencies from graph Then H is an I-map for P If joint probability distribution P:

–  Carlos Guestrin What about the other direction for Markov networks ? Counter-example: X 1,…,X 4 are binary, and only eight assignments have positive probability: For example, X 1  X 3 |X 2,X 4 :  E.g., P(X 1 =0|X 2 =0, X 4 =0) But distribution doesn’t factorize!!! If H is an I-map for P Then joint probability distribution P:

–  Carlos Guestrin Markov networks representation Theorem 2 (Hammersley-Clifford Theorem) Positive distribution and independencies ) P factorizes over graph If H is an I-map for P and P is a positive distribution Then joint probability distribution P:

–  Carlos Guestrin Representation Theorem for Markov Networks If H is an I-map for P and P is a positive distribution Then H is an I-map for P If joint probability distribution P: joint probability distribution P:

–  Carlos Guestrin Completeness of separation in Markov networks Theorem: Completeness of separation  For “almost all” distributions that P factorize over Markov network H, we have that I(H) = I(P)  “almost all” distributions: except for a set of measure zero of parameterizations of the Potentials (assuming no finite set of parameterizations has positive measure) Analogous to BNs

–  Carlos Guestrin What are the “local” independence assumptions for a Markov network? In a BN G:  local Markov assumption: variable independent of non-descendants given parents  d-separation defines global independence  Soundness: For all distributions: In a Markov net H:  Separation defines global independencies  What are the notions of local independencies?

–  Carlos Guestrin Local independence assumptions for a Markov network Separation defines global independencies Pairwise Markov Independence:  Pairs of non-adjacent variables A,B are independent given all others Markov Blanket:  Variable A independent of rest given its neighbors T1T1 T3T3 T4T4 T5T5 T6T6 T2T2 T7T7 T8T8 T9T9

–  Carlos Guestrin Equivalence of independencies in Markov networks Soundness Theorem: For all positive distributions P, the following three statements are equivalent:  P entails the global Markov assumptions  P entails the pairwise Markov assumptions  P entails the local Markov assumptions (Markov blanket)

–  Carlos Guestrin Minimal I-maps and Markov Networks A fully connected graph is an I-map Remember minimal I-maps?  A “simplest” I-map ! Deleting an edge makes it no longer an I-map In a BN, there is no unique minimal I-map Theorem: For positive distributions & Markov network, minimal I-map is unique!! Many ways to find minimal I-map, e.g.,  Take pairwise Markov assumption:  If P doesn’t entail it, add edge:

–  Carlos Guestrin How about a perfect map? Remember perfect maps?  independencies in the graph are exactly the same as those in P For BNs, doesn’t always exist  counter example: Swinging Couples How about for Markov networks?

–  Carlos Guestrin Unifying properties of BNs and MNs BNs:  give you: V-structures, CPTs are conditional probabilities, can directly compute probability of full instantiation  but: require acyclicity, and thus no perfect map for swinging couples MNs:  give you: cycles, and perfect maps for swinging couples  but: don’t have V-structures, cannot interpret potentials as probabilities, requires partition function Remember PDAGS???  skeleton + immoralities  provides a (somewhat) unified representation  see book for details

–  Carlos Guestrin What you need to know so far about Markov networks Markov network representation:  undirected graph  potentials over cliques (or sub-cliques)  normalize to obtain probabilities  need partition function Representation Theorem for Markov networks  if P factorizes, then it’s an I-map  if P is an I-map, only factorizes for positive distributions Independence in Markov nets:  active paths and separation  pairwise Markov and Markov blanket assumptions  equivalence for positive distributions Minimal I-maps in MNs are unique Perfect maps don’t always exist

–  Carlos Guestrin Some common Markov networks and generalizations Pairwise Markov networks A very simple application in computer vision Logarithmic representation Log-linear models Factor graphs

–  Carlos Guestrin Pairwise Markov Networks All factors are over single variables or pairs of variables:  Node potentials  Edge potentials Factorization: Note that there may be bigger cliques in the graph, but only consider pairwise potentials T1T1 T3T3 T4T4 T5T5 T6T6 T2T2 T7T7 T8T8 T9T9

–  Carlos Guestrin A very simple vision application Image segmentation: separate foreground from background Graph structure:  pairwise Markov net  grid with one node per pixel Node potential:  “background color” v. “foreground color” Edge potential:  neighbors like to be of the same class

–  Carlos Guestrin Logarithmic representation Standard model: Log representation of potential (assuming positive potential):  also called the energy function Log representation of Markov net:

–  Carlos Guestrin Log-linear Markov network (most common representation) Feature is some function f  [D] for some subset of variables D  e.g., indicator function Log-linear model over a Markov network H:  a set of features f 1 [D 1 ],…, f k [D k ] each D i is a subset of a clique in H two f’s can be over the same variables  a set of weights w 1,…,w k usually learned from data 

–  Carlos Guestrin Structure in cliques Possible potentials for this graph: A B C

–  Carlos Guestrin Factor graphs Very useful for approximate inference  Make factor dependency explicit Bipartite graph:  variable nodes (ovals) for X 1,…,X n  factor nodes (squares) for  1,…,  m  edge X i –  j if X i 2 Scope[  j ] A B C

–  Carlos Guestrin Exact inference in MNs and Factor Graphs Variable elimination algorithm presented in terms of factors ! exactly the same VE algorithm can be applied to MNs & Factor Graphs Junction tree algorithms also applied directly here:  triangulate MN graph as we did with moralized graph  each factor belongs to a clique  same message passing algorithms

–  Carlos Guestrin Summary of types of Markov nets Pairwise Markov networks  very common  potentials over nodes and edges Log-linear models  log representation of potentials  linear coefficients learned from data  most common for learning MNs Factor graphs  explicit representation of factors you know exactly what factors you have  very useful for approximate inference

–  Carlos Guestrin What you learned about so far Bayes nets Junction trees (General) Markov networks Pairwise Markov networks Factor graphs How do we transform between them? More formally:  I give you an graph in one representation, find an I-map in the other

–  Carlos Guestrin From Bayes nets to Markov nets SATGrade Job Letter Intelligence

–  Carlos Guestrin BNs ! MNs: Moralization Theorem: Given a BN G the Markov net H formed by moralizing G is the minimal I-map for I(G) Intuition:  in a Markov net, each factor must correspond to a subset of a clique  the factors in BNs are the CPTs  CPTs are factors over a node and its parents  thus node and its parents must form a clique Effect:  some independencies that could be read from the BN graph become hidden Difficulty SATGrade Happy Job Coherence Letter IntelligenceDifficulty SATGrade Happy Job Coherence Letter Intelligence

–  Carlos Guestrin From Markov nets to Bayes nets ExamGrade Job Letter Intelligence SAT

–  Carlos Guestrin MNs ! BNs: Triangulation Theorem: Given a MN H, let G be the Bayes net that is a minimal I-map for I(H) then G must be chordal Intuition:  v-structures in BN introduce immoralities  these immoralities were not present in a Markov net  the triangulation eliminates immoralities Effect:  many independencies that could be read from the MN graph become hidden ExamGrade Job Letter Intelligence SAT ExamGrade Job Letter Intelligence SAT

–  Carlos Guestrin Markov nets v. Pairwise MNs Every Markov network can be transformed into a Pairwise Markov net  introduce extra “variable” for each factor over three or more variables  domain size of extra variable is exponential in number of vars in factor Effect:  any local structure in factor is lost  a chordal MN doesn’t look chordal anymore A B C

–  Carlos Guestrin Overview of types of graphical models and transformations between them