Download presentation
Presentation is loading. Please wait.
1
Image Warping using Empirical Bayes John Ashburner & Karl J. Friston. Wellcome Department of Cognitive Neurology, London, UK.
2
Image Warping using Empirical Bayes, Ashburner & Friston, 2001 2 Bayesian approaches to image warping ¤ Image warping procedures are often thought of as a regularised maximum likelihood problem, and involve finding a set of parameters for a warping model that maximise the likelihood of the observed data. What is actually needed is a set of parameters that are maximally likely given the data. In order to achieve this, the problem should be considered within a Bayesian context, and expressed something like: ¤ “The posterior probability of the parameters given the data is proportional to the likelihood of observing the data given the parameters times the prior probability of the parameters.” P(x|y) P(y|x) P(x) ¤ Taking logs converts the Bayesian problem into one of simultaneously minimising two cost function terms. The first is usually a measure of intensity correlation between the images (likelihood potential), and the second is a measure of the warp roughness (prior potential). ¤ -log P(x|y) = -log P(y|x) + -log P(x) + const
3
Image Warping using Empirical Bayes, Ashburner & Friston, 2001 3 A generative model for image registration. ¤ Registration (in one dimension) involves minimising something like: ¤ 1 y (n)T y (n) + 2 x (n)T V -1 x (n) ¤ where: y i (n) = g i -f(i+ k b ik x k (n) ) ¤ g i is the ith voxel of the template image. ¤ f(j) is the interpolated value of the source image at point j. ¤ x k (n) is the kth parameter estimate at iteration n. ¤ b ik is the ith value of the kth spatial basis function. ¤ At each iteration, x is updated by (Ashburner & Friston, 1999): ¤ x (n+1) = x (n) - ( 1 A (n)T A (n) + 2 V -1 ) -1 ( 1 A (n)T y (n) + 2 V -1 x (n) ) ¤ where: ¤ x (n) are the parameter estimates at iteration n. ¤ y (n) are the current differences between source image and template at iteration n. ¤ A (n) are the derivatives of y (n) with respect to each parameter. ¤ V is the form of the a priori covariance matrix for the parameters. ¤ 1 and 2 are hyper-parameters describing the relative importance of the cost function terms. ¤ The choice of 1 and 2 strongly influences the final solution.
4
Image Warping using Empirical Bayes, Ashburner & Friston, 2001 4 Getting the right balance ¤ An ongoing argument within the field of image registration is something to the effect of “if there are too many parameters in a registration model, then the model will be over-fitted”, versus “if not enough parameters are used, then the images can not be registered properly”. ¤ A better way of expressing this problem is in terms of obtaining an optimal balance between the cost function terms, which involves assigning the most appropriate weighting to each. ¤ These weights (hyper-parameters 1 and 2 in this example) can be estimated using an Empirical Bayes approach, which employs an expectation maximisation (EM) optimisation scheme. ¤ See also abstract #220 (Phillips et al, 2001), which uses Empirical Bayes to determine optimal regularisation for the EEG source localisation problem.
5
Image Warping using Empirical Bayes, Ashburner & Friston, 2001 5 EM framework ¤ Objective is to find the hyper-parameters ( ), that maximise the likelihood of observed data (y), given that there are un-observed variables (x) in the model: P(y| ) = x P(y|x, ) dx ¤ Introduce a distribution Q(x) to approximate P(x|y, ). ¤ EM maximises a lower bound on P(y| ) (Neal & Hinton, 1998).. ¤ Lower bound is exact when Kullback-Liebler (KL) divergence between Q(x) and P(x|y, ) is zero. ¤ Objective function to maximise is (the free energy): F(Q, )=log P(y| ) - x Q(x) log (Q(x)/P(x|y, )) dx = x Q(x) log P(x,y| ) dx - x Q(x) log Q(x) dx ¤ Expectation Maximisation algorithms maximise F(Q, ): ¤ E-step: maximise F(Q, ) w.r.t. Q keeping constant. ¤ M-step: maximise F(Q, ) w.r.t. keeping Q constant.
6
Image Warping using Empirical Bayes, Ashburner & Friston, 2001 6 REML objective function ¤ Simple linear generative model: y = Ax + e. where e ~ N(0,C( )) ¤ A is M N, M M covariance matrix C( ) parameterised by . ¤ and x are assumed to be drawn from a uniform distribution, so: P(x,y| ) = P(y|x, ) = (2 ) -M/2 |C( )| -1/2 exp(- (y-Ax) T C( ) -1 (y-Ax)) ¤ The distributing approximation is: Q(x) = (2 ) -N/2 |A T C( ) -1 A| 1/2 exp(- (x- )A T C( ) -1 A(x- )) ¤ After a bit of rearranging... x Q(x) log P(x,y| )dx = - M log(2 ) - log|C( )| - (y-A ) T C( ) -1 (y-A )) - M x Q(x) log Q(x)dx = - N log(2 ) + log|A T C( ) -1 A| - M ¤ Combining these, the cost function becomes the Restricted Maximum Likelihood (REML) objective function (see e.g., Harville, 1977): F(Q, ) = - log|C( )| - (y-A ) T C( ) -1 (y-A )) - log|A T C( ) -1 A| + const
7
Image Warping using Empirical Bayes, Ashburner & Friston, 2001 7 Simple REML estimation. ¤ Consider a simple model: ¤ y = Ax + , where ~N(0, 1 I), and x~N(0, 2 V) ¤ where A is MxN and V is NxN ¤ The REML objective function for estimating 1 and 2 is: ¤ F(Q, ) = M log 1 + N log 2 - 1 (y-Ax) T (y-Ax) - 2 x T V -1 x + log| 1 A T A (n) + 2 V -1 | + const ¤ A straightforward algorithm for maximising F(Q, ) is: repeat{ E-step: compute expectation of x by: T = ( 1 (n) A T A + 2 (n) V -1 ) -1 x= T( 1 (n) A T y + 2 (n) V -1 x) M-step: re-estimate hyper-parameters ( ): 1 (n+1) = (M - 1 (n) tr(TA T A))/((y-Ax) T (y-Ax)) 2 (n+1) = (N - 2 (n) tr(TV -1 ))/(x T V -1 x) } until convergence
8
Image Warping using Empirical Bayes, Ashburner & Friston, 2001 8 Algorithm for non-linear REML estimation A similar iterative algorithm can be obtained for image registration problems that estimate by maximising the REML objective function. Initialise by assigning starting estimates to x and . Repeat { E-step: compute MAP estimate of parameters (x) by repeating { Compute A (n) from basis functions and gradient of f. y i (n) = g i -f(i+ k b ik x k (n) ) r 1 = y (n)T y (n) r 2 = x (n)T V -1 x (n) x (n+1) = x (n) - ( 1 A (n)T A (n) + 2 V -1 ) -1 ( 1 A (n)T y (n) + 2 V -1 x (n) ) }… until 1 r 1 + 2 r 2 converges. M-step: re-estimate hyper-parameters ( ): 1 (n+1) = (M - 1 (n) tr(( 1 (n) A T A + 2 (n) V -1 ) -1 A T A))/r 1 2 (n+1) = (N - 2 (n) tr(( 1 (n) A T A + 2 (n) V -1 ) -1 V -1 ))/r 2 } until M log 1 + N log 2 - 1 r 1 - 2 r 2 + log| 1 A T A+ 2 V -1 | converges
9
Image Warping using Empirical Bayes, Ashburner & Friston, 2001 9 The problem with smoothness ¤ Images are normally smoothed before registering. ¤ Makes Taylor series approximation more accurate. ¤ Reduces confounding effects of differences that can not be modelled by a small number of basis functions. ¤ Reduces the number of potential local minima. ¤ Unfortunately, the REML model assumes that residuals are i.i.d. after modelling the covariance among the data. ¤ A more robust, but less efficient estimator of the warps is obtained using smoothed images (see Worsley & Friston 1995) ¤ Pure REML is not suitable for this. ¤ Consider: y = Ax + , where ~N(0, 1 S), and x~N(0, 2 V) ¤ S is autocorrelation among data - e.g. Gaussian. ¤ A more robust estimator:x= ( 1 (n) A T A + 2 (n) V -1 ) -1 1 (n) A T y ¤ A more efficient estimator:x= ( 1 (n) A T S -1 A + 2 (n) V -1 ) -1 1 (n) A T S -1 y
10
Image Warping using Empirical Bayes, Ashburner & Friston, 2001 10 Coping with smoothness ¤ Modified algorithm devised that uses the more robust estimator ¤ it appears to work, but we can’t prove why. repeat{ “E-step”: compute robust estimation of x by: T = ( 1 (n) A T A + 2 (n) V -1 ) -1 x= T 1 (n) A T y “M-step”: re-estimate hyper-parameters ( ): 1 (n+1) = (tr(S- 1 (n) TA T SA))/((y-Ax) T S(y-Ax)) 2 (n+1) = (N - 2 (n) tr(TV -1 ))/(x T V -1 x) } until convergence ¤ Residual smoothness is likely to differ at the end of each iteration ¤ FWHM of Gaussian smoothing is re-estimated from the residuals before each “M-step” (Poline et al, 1995). ¤ S is computed from this FWHM
11
Image Warping using Empirical Bayes, Ashburner & Friston, 2001 11 Laplace approximations & residual smoothness ¤ Approximations involve modelling a non-Gaussian distribution with a Gaussian. ¤ Assumes a quadratic form for the errors. ¤ Sum of squares difference is not quadratic with displacement. ¤ Residual variance under-estimates displacement errors for larger displacements. ¤ Partially compensated using a correction derived from estimated smoothness of residuals. ¤ Illustrated by computing sum of squares difference between an image and itself translated by different amounts ¤ this ad hoc correction involves multiplying by square of estimated FWHM of residuals.
12
Image Warping using Empirical Bayes, Ashburner & Friston, 2001 12 Discussion ¤ Still early days... ¤ Evaluations only on simulated 1D data - appear to work well. ¤ Correction for smoothness may need refining. ¤ Current version works - but don’t know why. ¤ Local minima reduced because the algorithm determines a good balance among the hyper-parameters at each iteration. ¤ Automatically increases model complexity as the fit improves (see e.g., Lester and Arridge, 1999). ¤ The future... ¤ Modifications for non-linear regularisation methods. ¤ More complex models of shape variability using more hyper-parameters. ¤ Better variability estimates by simultaneously registering many images. ¤ Would give a more complete picture of the structural variability expected among a population. This is information that could be used by subsequent image warping procedures in order to make them more accurate.
13
Image Warping using Empirical Bayes, Ashburner & Friston, 2001 13 References ¤ J. Ashburner and K. J. Friston. “Nonlinear spatial normalization using basis functions”. Human Brain Mapping 7(4):254-266, 1999. ¤ D. A. Harville. “Maximum likelihood approaches to variance component estimation and to related problems”. Journal of the American Statistical Association 72(358):320-338, 1977. ¤ R. M. Neal and G. E. Hinton. “A view of the EM algorithm that justifies incremental, sparse and other variants”. In Learning in Graphical Models, edited by M. I. Jordan. 1998. ¤ C. Phillips, M. D. Rugg and K. J. Friston. “Systematic noise regularisation for linear inverse solution of the source localisation problem in EEG”. Abstract No. 220. HBM2001, Brighton, UK. ¤ J.-B. Poline, K. J. Worsley, A. P. Holmes, R. S. J. Frackowiak and K. J. Friston. “Estimating smoothness in statistical parametric maps: variability of p values”. JCAT 19(5):788-796, 1995. ¤ K. J. Worsley and K. J. Friston. “Analysis of fMRI time-series revisited - again”. NeuroImage 2:173-181, 1995. ¤ H. Lester and S. R. Arridge. “A survey of hierarchical non-linear medical image registration”. Pattern Recognition 32:129-149, 1999. ¤ Matlab code illustrating a simple 1D example is available from: ¤ http://www.fil.ion.ucl.ac.uk/~john/misc ¤ Many thanks to Keith Worsley and Tom Nichols for pointing us to the REML and Empirical Bayes literature, and to Will Penny for many useful discussions.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.