SegmentationSegmentation C. Phillips, Institut Montefiore, ULg, 2006
In image analysis, segmentation is the partition of a digital image into multiple regions (sets of pixels), according to some criterion. The goal of segmentation is typically to locate certain objects of interest which may be depicted in the image. Segmentation criteria can be arbitrarily complex, and take into account global as well as local criteria. A common requirement is that each region must be connected in some sense. DefinitionDefinition
A simple example of segmentation is thresholding a grayscale image with a fixed threshold t: each pixel p is assigned to one of two classes, P 0 or P 1, depending on whether I(p) < t or I(p) ≥ t. t=.5
Example: medical imaging...
How to fix the threshold ?
Goal of brain image segmentation Split the head volume into its « main » components: gray matter (GM) white matter (WM) cerebrol-spinal fluid (CSF) the rest/others (tumour)
Manual segmentation: an operator classifies the voxels manually Segmentation approaches
Semi-automatic segmentation: an operator defines a set of parameters, that are passed to an algorithm Example: threshold at t=200 Segmentation approaches
Automatic segmentation: no operator intervention objective and reproducible
Model the histogram of the image ! Intensity based segmentation
Segmentation - Mixture Model Intensities are modelled by a mixture of K Gaussian distributions, parameterised by:Intensities are modelled by a mixture of K Gaussian distributions, parameterised by: –means –variances –mixing proportions Intensities are modelled by a mixture of K Gaussian distributions, parameterised by:Intensities are modelled by a mixture of K Gaussian distributions, parameterised by: –means –variances –mixing proportions
Segmentation - Algorithm Starting estimates for belonging probabilities Compute Gaussian parameters from belonging probabilities Compute belonging probabilities from Gaussian parameters Converged ? NoYes STOP
Segmentation - Problems Noise & Partial volume effect
MR images are corrupted by a smooth intensity non-uniformity (bias). Image with bias artefact Corrected image Segmentation - Problems Intensity bias field
Segmentation - Priors Overlay prior belonging probability maps to assist the segmentation –Prior probability of each voxel being of a particular type is derived from segmented images of 151 subjects Assumed to beAssumed to berepresentative –Requires initial registration to standard space. Overlay prior belonging probability maps to assist the segmentation –Prior probability of each voxel being of a particular type is derived from segmented images of 151 subjects Assumed to beAssumed to berepresentative –Requires initial registration to standard space.
Bias correction informs segmentation Registration informs segmentation Segmentation informs bias correction Bias correction informs registration Segmentation informs registration Unified approach: segmentation-correction-registration
Unified Segmentation The solution to this circularity is to put everything in the same Generative Model.The solution to this circularity is to put everything in the same Generative Model. –A MAP solution is found by repeatedly alternating among classification, bias correction and registration steps. The Generative Model involves:The Generative Model involves: –Mixture of Gaussians (MOG) –Bias Correction Component –Warping (Non-linear Registration) Component The solution to this circularity is to put everything in the same Generative Model.The solution to this circularity is to put everything in the same Generative Model. –A MAP solution is found by repeatedly alternating among classification, bias correction and registration steps. The Generative Model involves:The Generative Model involves: –Mixture of Gaussians (MOG) –Bias Correction Component –Warping (Non-linear Registration) Component
Gaussian Probability Density If intensities are assumed to be Gaussian of mean k and variance 2 k, then the probability of a value y i is:If intensities are assumed to be Gaussian of mean k and variance 2 k, then the probability of a value y i is:
Non-Gaussian Probability Distribution A non-Gaussian probability density function can be modelled by a Mixture of Gaussians (MOG):A non-Gaussian probability density function can be modelled by a Mixture of Gaussians (MOG): Mixing proportion - positive and sums to one
Mixing Proportions The mixing proportion k represents the prior probability of a voxel being drawn from class k - irrespective of its intensity.The mixing proportion k represents the prior probability of a voxel being drawn from class k - irrespective of its intensity. So:So: The mixing proportion k represents the prior probability of a voxel being drawn from class k - irrespective of its intensity.The mixing proportion k represents the prior probability of a voxel being drawn from class k - irrespective of its intensity. So:So:
Non-Gaussian Intensity Distributions Multiple Gaussians per tissue class allow non-Gaussian intensity distributions to be modelled.Multiple Gaussians per tissue class allow non-Gaussian intensity distributions to be modelled.
Probability of Whole Dataset If the voxels are assumed to be independent, then the probability of the whole image is the product of the probabilities of each voxel:If the voxels are assumed to be independent, then the probability of the whole image is the product of the probabilities of each voxel: It is often easier to work with negative log-probabilities:It is often easier to work with negative log-probabilities: If the voxels are assumed to be independent, then the probability of the whole image is the product of the probabilities of each voxel:If the voxels are assumed to be independent, then the probability of the whole image is the product of the probabilities of each voxel: It is often easier to work with negative log-probabilities:It is often easier to work with negative log-probabilities:
Modelling a Bias Field A bias field is included, such that the required scaling at voxel i, parameterised by , is i ( ).A bias field is included, such that the required scaling at voxel i, parameterised by , is i ( ). Replace the means by k / i ( )Replace the means by k / i ( ) Replace the variances by ( k / i ( )) 2Replace the variances by ( k / i ( )) 2 A bias field is included, such that the required scaling at voxel i, parameterised by , is i ( ).A bias field is included, such that the required scaling at voxel i, parameterised by , is i ( ). Replace the means by k / i ( )Replace the means by k / i ( ) Replace the variances by ( k / i ( )) 2Replace the variances by ( k / i ( )) 2
Modelling a Bias Field After rearranging:After rearranging: ()() y y ()y ()
Tissue Probability Maps Tissue probability maps (TPMs) are used instead of the proportion of voxels in each Gaussian as the prior.Tissue probability maps (TPMs) are used instead of the proportion of voxels in each Gaussian as the prior. ICBM Tissue Probabilistic Atlases. These tissue probability maps are kindly provided by the International Consortium for Brain Mapping, John C. Mazziotta and Arthur W. Toga.
“Mixing Proportions” Tissue probability maps for each class are available.Tissue probability maps for each class are available. The probability of obtaining class k at voxel i, given weights is then:The probability of obtaining class k at voxel i, given weights is then:
Deforming the Tissue Probability Maps Tissue probability images are deformed according to parameters .Tissue probability images are deformed according to parameters . The probability of obtaining class k at voxel i, given weights and parameters is then:The probability of obtaining class k at voxel i, given weights and parameters is then:
The Extended Model By combining the modified P(c i =k| ) and P(y i |c i =k, ), the overall objective function ( E ) becomes:By combining the modified P(c i =k| ) and P(y i |c i =k, ), the overall objective function ( E ) becomes: The Objective Function
OptimisationOptimisation The “best” parameters are those that minimise this objective function.The “best” parameters are those that minimise this objective function. Optimisation involves finding them.Optimisation involves finding them. Begin with starting estimates, and repeatedly change them so that the objective function decreases each time.Begin with starting estimates, and repeatedly change them so that the objective function decreases each time. The “best” parameters are those that minimise this objective function.The “best” parameters are those that minimise this objective function. Optimisation involves finding them.Optimisation involves finding them. Begin with starting estimates, and repeatedly change them so that the objective function decreases each time.Begin with starting estimates, and repeatedly change them so that the objective function decreases each time.
Schematic of optimisation Repeat until convergence... Hold , , 2 and constant, and minimise E w.r.t. - Levenberg-Marquardt strategy, using dE/d and d 2 E/d 2 Hold , , 2 and constant, and minimise E w.r.t. - Levenberg-Marquardt strategy, using dE/d and d 2 E/d 2 Hold and constant, and minimise E w.r.t. , and 2 -Use an Expectation Maximisation (EM) strategy. end Repeat until convergence... Hold , , 2 and constant, and minimise E w.r.t. - Levenberg-Marquardt strategy, using dE/d and d 2 E/d 2 Hold , , 2 and constant, and minimise E w.r.t. - Levenberg-Marquardt strategy, using dE/d and d 2 E/d 2 Hold and constant, and minimise E w.r.t. , and 2 -Use an Expectation Maximisation (EM) strategy. end (Iterated Conditional Mode)
Levenberg-Marquardt Optimisation LM optimisation is used for the nonlinear registration and bias correction components.LM optimisation is used for the nonlinear registration and bias correction components. Requires first and second derivatives of the objective function ( E ).Requires first and second derivatives of the objective function ( E ). Parameters and are updated byParameters and are updated by Increase to improve stability (at expense of decreasing speed of convergence).Increase to improve stability (at expense of decreasing speed of convergence). LM optimisation is used for the nonlinear registration and bias correction components.LM optimisation is used for the nonlinear registration and bias correction components. Requires first and second derivatives of the objective function ( E ).Requires first and second derivatives of the objective function ( E ). Parameters and are updated byParameters and are updated by Increase to improve stability (at expense of decreasing speed of convergence).Increase to improve stability (at expense of decreasing speed of convergence).
EM is used to update , 2 and For iteration (n), alternate between: –E-step: Estimate belonging probabilities by: –M-step: Set (n+1) to values that reduce: For iteration (n), alternate between: –E-step: Estimate belonging probabilities by: –M-step: Set (n+1) to values that reduce:
Bayes rule states: p(q|e) p(e|q) p(q) – –p(q|e) is the a posteriori probability of parameters q given errors e. – –p(e|q) is the likelihood of observing errors e given parameters q. – –p(q) is the a priori probability of parameters q. Maximum a posteriori (MAP) estimate maximises p(q|e). Maximising p(q|e) is equivalent to minimising the Gibbs potential of the posterior distribution (H(q|e), where H(q|e) -log p(q|e)). The posterior potential is the sum of the likelihood and prior potentials: H(q|e) = H(e|q) + H(q) + c – –The likelihood potential ( H(e|q) -log p(e|q )) is based upon the sum of squared difference between the images. – –The prior potential ( H(q) -log p(q )) penalises unlikely deformations. Bayes rule states: p(q|e) p(e|q) p(q) – –p(q|e) is the a posteriori probability of parameters q given errors e. – –p(e|q) is the likelihood of observing errors e given parameters q. – –p(q) is the a priori probability of parameters q. Maximum a posteriori (MAP) estimate maximises p(q|e). Maximising p(q|e) is equivalent to minimising the Gibbs potential of the posterior distribution (H(q|e), where H(q|e) -log p(q|e)). The posterior potential is the sum of the likelihood and prior potentials: H(q|e) = H(e|q) + H(q) + c – –The likelihood potential ( H(e|q) -log p(e|q )) is based upon the sum of squared difference between the images. – –The prior potential ( H(q) -log p(q )) penalises unlikely deformations. Bayesian Formulation
Linear Regularisation Some bias fields and distortions are more probable (a priori) than others.Some bias fields and distortions are more probable (a priori) than others. Encoded using Bayes rule:Encoded using Bayes rule: Prior probability distributions can be modelled by a multivariate normal distribution.Prior probability distributions can be modelled by a multivariate normal distribution. –Mean vector a and b –Covariance matrix a and b –-log[P(a)] = (a-m a ) T S a -1 (a-m a ) + const Some bias fields and distortions are more probable (a priori) than others.Some bias fields and distortions are more probable (a priori) than others. Encoded using Bayes rule:Encoded using Bayes rule: Prior probability distributions can be modelled by a multivariate normal distribution.Prior probability distributions can be modelled by a multivariate normal distribution. –Mean vector a and b –Covariance matrix a and b –-log[P(a)] = (a-m a ) T S a -1 (a-m a ) + const
Voxels are assumed independent!
Hidden Markov Random Field Voxels are NOT independent: GM voxels are surrounded by other GM voxels, at least on one side. Model the intensity and classification of the image voxels by 2 random field: a visible field y for the intensities a hidden field c for the classifications Modify the cost function E: And, at each voxel, the 6 neighbouring voxels are used to to build U mrf, imposing local spatial constraints.
Hidden Markov Random Field T1 imageT2 image
Hidden Markov Random Field White matter T1 & T2: MoG + hmrf T1 only: MoG only
Hidden Markov Random Field Gray matter T1 & T2: MoG + hmrf T1 only: MoG only
Hidden Markov Random Field CSF T1 & T2: MoG + hmrf T1 only: MoG only
PerspectivesPerspectives Multimodal segmentation : 1 image is good but 2 is better ! Model the joint histogram using multi- dimensional normal distributions. Tumour detection : contrasted images to modify the prior images automatic detection of outliers ?
Thank you for your attention !