Markov random fields. The Markov property Discrete time: A time symmetric version: A more general version: Let A be a set of indices >k, B a set of indices.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Dates for term tests Friday, February 07 Friday, March 07

Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,

Markov Networks Alan Ritter.

Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.

Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.

1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

An Introduction to Variational Methods for Graphical Models.

The General Linear Model. The Simple Linear Model Linear Regression.

Lecture 3: Markov processes, master equation

Entropy Rates of a Stochastic Process

Machine Learning & Data Mining CS/CNS/EE 155 Lecture 14: Embeddings 1Lecture 14: Embeddings.

Chapter 5 Orthogonality

. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:

The Markov property Discrete time: A time symmetric version: A more general version: Let A be a set of indices >k, B a set of indices

If time is continuous we cannot write down the simultaneous distribution of X(t) for all t. Rather, we pick n, t 1,...,t n and write down probabilities.

Problems, cont. 3. where k=0?. When are there stationary distributions? Theorem: An irreducible chain has a stationary distribution  iff the states are.

CS Subdivision I: The Univariate Setting Peter Schröder.

Lecture II-2: Probability Review

Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution Business Statistics: A First Course 5 th.

Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.

Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.

6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.

Entropy Rate of a Markov Chain

Time Series Analysis.

Physics Fluctuomatics (Tohoku University) 1 Physical Fluctuomatics 7th~10th Belief propagation Appendix Kazuyuki Tanaka Graduate School of Information.

The Examination of Residuals. The residuals are defined as the n differences : where is an observation and is the corresponding fitted value obtained.

Finite Element Method.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

Probabilistic Graphical Models

The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.

Markov Random Fields Probabilistic Models for Images

Siddhartha Shakya1 Estimation Of Distribution Algorithm based on Markov Random Fields Siddhartha Shakya School Of Computing The Robert Gordon.

Linear Predictive Analysis 主講人：虞台文. Contents Introduction Basic Principles of Linear Predictive Analysis The Autocorrelation Method The Covariance Method.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

Fields of Experts: A Framework for Learning Image Priors (Mon) Young Ki Baik, Computer Vision Lab.

CS774. Markov Random Field : Theory and Application Lecture 02

Chapter 61 Continuous Time Markov Chains Birth and Death Processes,Transition Probability Function, Kolmogorov Equations, Limiting Probabilities, Uniformization.

Solution of Sparse Linear Systems

Geology 5670/6670 Inverse Theory 21 Jan 2015 © A.R. Lowry 2015 Read for Fri 23 Jan: Menke Ch 3 (39-68) Last time: Ordinary Least Squares Inversion Ordinary.

Image Analysis, Random Fields and Dynamic MCMC By Marc Sobel.

ECE-7000: Nonlinear Dynamical Systems Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.

Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.

Lecture 2: Statistical learning primer for biologists

Belief Propagation and its Generalizations Shane Oldenburger.

Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,

Graduate School of Information Sciences, Tohoku University

6.4 Random Fields on Graphs 6.5 Random Fields Models In “Adaptive Cooperative Systems” Summarized by Ho-Sik Seok.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Prof. Hao Zhu Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign.

Other Models for Time Series. The Hidden Markov Model (HMM)

Ch 6. Markov Random Fields 6.1 ~ 6.3 Adaptive Cooperative Systems, Martin Beckerman, Summarized by H.-W. Lim Biointelligence Laboratory, Seoul National.

Computational Physics (Lecture 10) PHY4370. Simulation Details To simulate Ising models First step is to choose a lattice. For example, we can us SC,

Laplacian Matrices of Graphs: Algorithms and Applications ICML, June 21, 2016 Daniel A. Spielman.

Biointelligence Laboratory, Seoul National University

Computational Physics (Lecture 10)

Statistical-Mechanical Approach to Probabilistic Image Processing -- Loopy Belief Propagation and Advanced Mean-Field Method -- Kazuyuki Tanaka and Noriko.

Probability Theory and Parameter Estimation I

Menglong Li Ph.d of Industrial Engineering Dec 1st 2016

Markov random fields Help from David Bolin, Johan Lindström and Finn Lindgren.

Latent Variables, Mixture Models and EM

Hidden Markov Models Part 2: Algorithms

Haim Kaplan and Uri Zwick

Markov Networks.

Outline Texture modeling - continued Markov Random Field models

Presentation transcript:

Markov random fields

The Markov property Discrete time: A time symmetric version: A more general version: Let A be a set of indices >k, B a set of indices <k. Then These are all equivalent.

On a spatial grid Let  i be the neighbors of the location i. The Markov assumption is Equivalently for The p i are called local characteristics. They are stationary if p i = p. A potential assigns a number V A (z) to every subconfiguration z A of a configuration z. (There are lots of them!)

Graphical models Neighbors are nodes connected with edges. Given 2, 1 and 4 are independent

Gibbs measure The energy U corresponding to a potential V is. The corresponding Gibbs measure is where is called the partition function.

Nearest neighbor potentials A set of points is a clique if all its members are neighbours. A potential is a nearest neighbor potential if V A (z)=0 whenever A is not a clique.

Markov random field Any nearest neighbour potential induces a Markov random field: where z’ agrees with z except possibly at i, so V C (z)=V C (z’) for any C not including i.

The Hammersley-Clifford theorem Assume P(z)>0 for all z. Then P is a MRF on a (finite) graph with respect to a neighbourhood system  iff P is a Gibbs measure corresponding to a nearest neighbour potential. Does a given nn potential correspond to a unique P?

The Ising model Model for ferromagnetic spin (values +1 or -1). Stationary nn pair potential V(i,j)=V(j,i); V(i,i)=V(0,0)=v 0 ; V(0,e N )=V(0,e E )=v 1. so where

Interpretation v 0 is related to the external magnetic field (if it is strong the field will tend to have the same sign as the external field) v 1 corresponds to inverse temperature (in Kelvins), so will be large for low temperatures.

Phase transition At very low temperature there is a tendency for spontaneous magnetization. For the Ising model, the boundary conditions can affect the distribution of x 0. In fact, there is a critical temperature (or value of v 1 ) such that for temperatures below this the boundary conditions are felt. Thus there can be different probabilities at the origin depending on the values on an arbitrary distant boundary!

Simulated Ising fields

Tomato disease Data on spotted wilt from the Waite Institute plots in 4x4 Latin square, each 6 rows with 15 plants each. Occurrence of the viral disease 23 days after planting.

Exponential tilting Simulate from the model u and estimate the expectation by an average.

Fitting the tomato data =2266 t 0 = t 1 =2266 Condition on boundary and simulate 100,000 draws from u=(0,0.5). Mle The simulated values of t 0 are half positive and half negative (phase transition).

The auto-models Let Q(x)=log(P(x)/P(0)). Besag’s auto- models are defined by When and G i (z i )=  i  we get the autologistic model When and  ij ≤0 we get the auto-Poisson model

Coding schemes In order to estimate parameters, it can be easier to not use all the data. Besag suggested a coding scheme in which one only uses data at points which are conditionally independent (given all the other data):

Pseudolikelihood Another approximate approach is to write down a function of the data which is the product of the, I.e., acting as if the neighborhoods of each point were independent. This as an estimating equation, but not an optimal one. In fact, in cases of high dependence it tends to be biased.

Recall the Gaussian formula If then Let be the precision matrix. Then the conditional precision matrix is

Gaussian MRFs We want a setup in which whenever i and j are not neighbors. Using the Gaussian formula we see that the condition is met iff Q ij = 0. Typically the precision matrix of a GMRF is sparse where the covariance is not. This allows fast computation of likelihoods, simulation etc.

An AR(1) process Let. The lag k autocorrelation is  |k|. The precision matrix has Q ij =  if |I-j|=1, Q 11 =Q nn =1 and Q ii =1+  2 elsewhere. Thus  has n 2 non-zero elements, while Q has n+2(n-1)=3n-2 non-zero elements. Using the Gaussian formula we see that

Conditional autoregression Suppose that This is called a Gaussian conditional autoregressive model. WLOG  i =0. If also these conditional distributions correspond to a multivariate joint Gaussian distribution, mean 0 and precision Q with Q ii =  i and Q ij = -  i  ij, provided Q is positive definite. If the  ij are nonzero only when i~j we have a GMRF.

Likelihood calculation The Cholesky decomposition of a pd square matrix A is a lower triangular matrix L such that A=LL T. To solve Ay = b first solve Lv = b (forward substitution), then L T y = v (backward substitution). If a precision matrix Q = LL T, log det(Q) = 2. The quadratic form in the likelihood is w T u where u=Qw and w=(z-  ). Note that

Simulation Let x ~ N(0,I), solve L T v = x and set z =  + v. Then E(z) =  and Var(z) = (L T ) -1 IL -1 = (LL T ) -1 = Q -1.

Spatial covariance Whittle (1963) noted that the solution to the stochastic differential equation has covariance function Rue and Tjelmeland (2003) show that one can approximate a Gaussian random field on a grid by a GMRF.

Unequal spacing Lindgren and Rue show how one can use finite element methods to approximate the solution to the sde (even on a manifold like a sphere) on a triangulization on a set of possibly unequally spaced points.

Covariance approximation

Precipitation in the Sahel The African Sahel region (south of Sahara) suffered severe drought in the 70s through 90s. There is recent evidence of recovery. Data from GHCN at 550 stations (monthly, aggregated to annual).

Model Precipitation is determined by a latent (hidden) process, modelled as a Gaussian process on the sphere. This is approximated by a GMRF Z(s i,t j )on a discrete number of points:

Details Y(s,t) = P(s,t) 1/3 (1-0.13P(s,t) 1/3 ) = Z(s,t)+  Mean is a linear combination of basis functions (B-splines). Temporal structure is AR(1). Fitting by MCMC.

Results

Reserved data Model check by reserving 10 stations.