Markov Random Fields Presented by: Vladan Radosavljevic
Outline Intuition Simple Example Theory Simple Example - Revisited Application Summary References
Intuition Simple example Observation Objective noisy image pixel values are -1 or +1 Objective recover noise free image
Intuition An idea Represent pixels as random variables y - observed variable x - hidden variable x and y are binary variables (-1 or +1) Question: Is there any relation among those variables?
Intuition Building a model Values of observed and original pixels should be correlated (small level of noise) - make connections! Values of neighboring pixels should be correlated (large homogeneous areas, objects) - make connections! Final model
Intuition Why do we need a model? y is given, x has to be find The objective is to find an image x that maximizes p(x|y) Use model penalize connected pairs in the model that have opposite sign as they are not correlated Assume distribution p(x,y) ~ exp(-E) E = over all pairs of connected nodes If xi and xj have the same sign, the probability will be higher The same holds for x and y
Intuition How to find an image x which maximizes probability p(x|y)? Assume x=y Take a node xi at time and evaluate E for xi=+1 and xi=-1 Set xi to the value that has lowest E (highest probability) Iterate through all nodes until convergence This method finds local optimum
Intuition Result
Theory Graphical models – a general framework for representing and manipulating joint distributions defined over sets of random variables Each variable is associated with a node in a graph Edges in the graph represent dependencies between random variables Directed graphs: represent causative relationships (Bayesian Networks) Undirected graphs: represent correlative relationships (Markov Random Fields) Representational aspect: efficiently represent complex independence relations Computational aspect: efficiently infer information about data using independence relations
Theory Recall If there are M variables in the model each having K possible states, straight forward inference algorithm will be exponential in the size of the model (KM) However, inference algorithms (whether computing distributions, expectations etc.) can use structure in the graph for the purposes of efficient computation
Theory How to use information from the structure? Markov property: If all paths that connect nodes in set A to nodes in set B pass through nodes in set C, then we say that A and B are conditionally independent given C: p(A,B|C) = p(A|C)p(B|C) The main idea is to factorize joint probability, then use sum and product rules for efficient computation
Theory General factorization If two nodes xi and xk are not connected, then they have to be conditionally independent given all other nodes There is no link between those two nodes and all other links pass through nodes that are observed This can be expressed as Therefore, joint distribution must be factorized such that unconnected nodes do not appear in the same factor This leads to the concept of clique: a subset of nodes such that all pairs of nodes in the subset are connected The factors are defined as functions on the all possible cliques
Theory Example Factorization where : potential function on the clique C Z : partition function
Theory Since potential functions have to be positive we can define them as: (there is also a theorem that proofs correspondence of this distribution and Markov Random Fields, E is energy function) Recall E =
Theory Why is this useful? Inference algorithms can take an advantage of such representation to significantly increase computational efficiency Example, inference on a chain To find marginal distribution ~ KN
Theory If we rearrange the order of summations and multiplications ~ NK2
Application
Summary Advantages Disadvantages Graphical representation Computational efficiency Disadvantages Parameter estimation How to define a model? Computing probability is sometimes difficult
References [1] Alexander T. Ihler, Sergey Kirshner, Michael Ghilc, Andrew W. Robertson and Padhraic Smyth “Graphical models for statistical inference and data assimilation”, Physica D: Nonlinear Phenomena, Volume 230, Issues 1-2, June 2007, Pages 72-87