SegmentationSegmentation C. Phillips, Institut Montefiore, ULg, 2006.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

VBM Susie Henley and Stefan Klöppel Based on slides by John Ashburner

A Growing Trend Larger and more complex models are being produced to explain brain imaging data. Bigger and better computers allow more powerful models.

Bayesian inference Lee Harrison York Neuroimaging Centre 01 / 05 / 2009.

SPM5 Segmentation. A Growing Trend Larger and more complex models are being produced to explain brain imaging data. Bigger and better computers allow.

Image Registration John Ashburner

Bayesian Belief Propagation

VBM Voxel-based morphometry

MRI preprocessing and segmentation.

Gordon Wright & Marie de Guzman 15 December 2010 Co-registration & Spatial Normalisation.

K Means Clustering , Nearest Cluster and Gaussian Mixture

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.

Chapter 4: Linear Models for Classification

Segmentation and Fitting Using Probabilistic Methods

Jeroen Hermans, Frederik Maes, Dirk Vandermeulen, Paul Suetens

Image Warping using Empirical Bayes John Ashburner & Karl J. Friston. Wellcome Department of Cognitive Neurology, London, UK.

Bayesian models for fMRI data

Bayesian models for fMRI data Methods & models for fMRI data analysis 06 May 2009 Klaas Enno Stephan Laboratory for Social and Neural Systems Research.

First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.

Motion Analysis (contd.) Slides are from RPI Registration Class.

Independent Component Analysis (ICA) and Factor Analysis (FA)

Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.

Visual Recognition Tutorial

Preprocessing II: Between Subjects John Ashburner Wellcome Trust Centre for Neuroimaging, 12 Queen Square, London, UK.

Voxel-Based Morphometry John Ashburner Wellcome Trust Centre for Neuroimaging, 12 Queen Square, London, UK.

Rician Noise Removal in Diffusion Tensor MRI

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

EM and expected complete log-likelihood Mixture of Experts

Model Inference and Averaging

DTU Medical Visionday May 27, 2009 Generative models for automated brain MRI segmentation Koen Van Leemput Athinoula A. Martinos Center for Biomedical.

1 Physical Fluctuomatics 5th and 6th Probabilistic information processing by Gaussian graphical model Kazuyuki Tanaka Graduate School of Information Sciences,

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.

PVE for MRI Brain Tissue Classification Zeng Dong SLST, UESTC 6-9.

Bayesian Inference and Posterior Probability Maps Guillaume Flandin Wellcome Department of Imaging Neuroscience, University College London, UK SPM Course,

CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

Mixture of Gaussians This is a probability distribution for random variables or N-D vectors such as… –intensity of an object in a gray scale image –color.

M. Pokric, P.A. Bromiley, N.A. Thacker, M.L.J. Scott, and A. Jackson University of Manchester Imaging Science and Biomedical Engineering Probabilistic.

National Alliance for Medical Image Computing Segmentation Foundations Easy Segmentation –Tissue/Air (except bone in MR) –Bone in CT.

Bayesian models for fMRI data Methods & models for fMRI data analysis November 2011 With many thanks for slides & images to: FIL Methods group, particularly.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:

Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819

Lecture 2: Statistical learning primer for biologists

Volumetric Intersubject Registration John Ashburner Wellcome Department of Imaging Neuroscience, 12 Queen Square, London, UK.

Image Registration John Ashburner

Regularization of energy-based representations Minimize total energy E p (u) + (1- )E d (u,d) E p (u) : Stabilizing function - a smoothness constraint.

Bayesian Methods Will Penny and Guillaume Flandin Wellcome Department of Imaging Neuroscience, University College London, UK SPM Course, London, May 12.

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.

Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.

Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London SPM Course Zurich, February 2008 Bayesian Inference.

Spatial processing of FMRI data And why you may care.

Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.

Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Biointelligence Laboratory, Seoul National University

Probability Theory and Parameter Estimation I

Classification of unlabeled data:

Latent Variables, Mixture Models and EM

Graduate School of Information Sciences, Tohoku University

Course Outline MODEL INFORMATION COMPLETE INCOMPLETE

Computational Neuroanatomy for Dummies

Probabilistic Models with Latent Variables

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

Image Registration John Ashburner

Anatomical Measures John Ashburner

EM Algorithm and its Applications

Presentation transcript:

SegmentationSegmentation C. Phillips, Institut Montefiore, ULg, 2006

In image analysis, segmentation is the partition of a digital image into multiple regions (sets of pixels), according to some criterion. The goal of segmentation is typically to locate certain objects of interest which may be depicted in the image. Segmentation criteria can be arbitrarily complex, and take into account global as well as local criteria. A common requirement is that each region must be connected in some sense. DefinitionDefinition

A simple example of segmentation is thresholding a grayscale image with a fixed threshold t: each pixel p is assigned to one of two classes, P 0 or P 1, depending on whether I(p) < t or I(p) ≥ t. t=.5

Example: medical imaging...

How to fix the threshold ?

Goal of brain image segmentation Split the head volume into its « main » components: gray matter (GM) white matter (WM) cerebrol-spinal fluid (CSF) the rest/others (tumour)

Manual segmentation: an operator classifies the voxels manually Segmentation approaches

Semi-automatic segmentation: an operator defines a set of parameters, that are passed to an algorithm Example: threshold at t=200 Segmentation approaches

Automatic segmentation: no operator intervention  objective and reproducible

Model the histogram of the image ! Intensity based segmentation

Segmentation - Mixture Model Intensities are modelled by a mixture of K Gaussian distributions, parameterised by:Intensities are modelled by a mixture of K Gaussian distributions, parameterised by: –means –variances –mixing proportions Intensities are modelled by a mixture of K Gaussian distributions, parameterised by:Intensities are modelled by a mixture of K Gaussian distributions, parameterised by: –means –variances –mixing proportions

Segmentation - Algorithm Starting estimates for belonging probabilities Compute Gaussian parameters from belonging probabilities Compute belonging probabilities from Gaussian parameters Converged ? NoYes STOP

Segmentation - Problems Noise & Partial volume effect

MR images are corrupted by a smooth intensity non-uniformity (bias). Image with bias artefact Corrected image Segmentation - Problems Intensity bias field

Segmentation - Priors Overlay prior belonging probability maps to assist the segmentation –Prior probability of each voxel being of a particular type is derived from segmented images of 151 subjects Assumed to beAssumed to berepresentative –Requires initial registration to standard space. Overlay prior belonging probability maps to assist the segmentation –Prior probability of each voxel being of a particular type is derived from segmented images of 151 subjects Assumed to beAssumed to berepresentative –Requires initial registration to standard space.

Bias correction informs segmentation Registration informs segmentation Segmentation informs bias correction Bias correction informs registration Segmentation informs registration Unified approach: segmentation-correction-registration

Unified Segmentation The solution to this circularity is to put everything in the same Generative Model.The solution to this circularity is to put everything in the same Generative Model. –A MAP solution is found by repeatedly alternating among classification, bias correction and registration steps. The Generative Model involves:The Generative Model involves: –Mixture of Gaussians (MOG) –Bias Correction Component –Warping (Non-linear Registration) Component The solution to this circularity is to put everything in the same Generative Model.The solution to this circularity is to put everything in the same Generative Model. –A MAP solution is found by repeatedly alternating among classification, bias correction and registration steps. The Generative Model involves:The Generative Model involves: –Mixture of Gaussians (MOG) –Bias Correction Component –Warping (Non-linear Registration) Component

Gaussian Probability Density If intensities are assumed to be Gaussian of mean  k and variance  2 k, then the probability of a value y i is:If intensities are assumed to be Gaussian of mean  k and variance  2 k, then the probability of a value y i is:

Non-Gaussian Probability Distribution A non-Gaussian probability density function can be modelled by a Mixture of Gaussians (MOG):A non-Gaussian probability density function can be modelled by a Mixture of Gaussians (MOG): Mixing proportion - positive and sums to one

Mixing Proportions The mixing proportion  k represents the prior probability of a voxel being drawn from class k - irrespective of its intensity.The mixing proportion  k represents the prior probability of a voxel being drawn from class k - irrespective of its intensity. So:So: The mixing proportion  k represents the prior probability of a voxel being drawn from class k - irrespective of its intensity.The mixing proportion  k represents the prior probability of a voxel being drawn from class k - irrespective of its intensity. So:So:

Non-Gaussian Intensity Distributions Multiple Gaussians per tissue class allow non-Gaussian intensity distributions to be modelled.Multiple Gaussians per tissue class allow non-Gaussian intensity distributions to be modelled.

Probability of Whole Dataset If the voxels are assumed to be independent, then the probability of the whole image is the product of the probabilities of each voxel:If the voxels are assumed to be independent, then the probability of the whole image is the product of the probabilities of each voxel: It is often easier to work with negative log-probabilities:It is often easier to work with negative log-probabilities: If the voxels are assumed to be independent, then the probability of the whole image is the product of the probabilities of each voxel:If the voxels are assumed to be independent, then the probability of the whole image is the product of the probabilities of each voxel: It is often easier to work with negative log-probabilities:It is often easier to work with negative log-probabilities:

Modelling a Bias Field A bias field is included, such that the required scaling at voxel i, parameterised by , is  i (  ).A bias field is included, such that the required scaling at voxel i, parameterised by , is  i (  ). Replace the means by  k /  i (  )Replace the means by  k /  i (  ) Replace the variances by (  k /  i (  )) 2Replace the variances by (  k /  i (  )) 2 A bias field is included, such that the required scaling at voxel i, parameterised by , is  i (  ).A bias field is included, such that the required scaling at voxel i, parameterised by , is  i (  ). Replace the means by  k /  i (  )Replace the means by  k /  i (  ) Replace the variances by (  k /  i (  )) 2Replace the variances by (  k /  i (  )) 2

Modelling a Bias Field After rearranging:After rearranging: ()() y y ()y ()

Tissue Probability Maps Tissue probability maps (TPMs) are used instead of the proportion of voxels in each Gaussian as the prior.Tissue probability maps (TPMs) are used instead of the proportion of voxels in each Gaussian as the prior. ICBM Tissue Probabilistic Atlases. These tissue probability maps are kindly provided by the International Consortium for Brain Mapping, John C. Mazziotta and Arthur W. Toga.

“Mixing Proportions” Tissue probability maps for each class are available.Tissue probability maps for each class are available. The probability of obtaining class k at voxel i, given weights  is then:The probability of obtaining class k at voxel i, given weights  is then:

Deforming the Tissue Probability Maps Tissue probability images are deformed according to parameters .Tissue probability images are deformed according to parameters . The probability of obtaining class k at voxel i, given weights  and parameters  is then:The probability of obtaining class k at voxel i, given weights  and parameters  is then:

The Extended Model By combining the modified P(c i =k|  ) and P(y i |c i =k,  ), the overall objective function ( E ) becomes:By combining the modified P(c i =k|  ) and P(y i |c i =k,  ), the overall objective function ( E ) becomes: The Objective Function

OptimisationOptimisation The “best” parameters are those that minimise this objective function.The “best” parameters are those that minimise this objective function. Optimisation involves finding them.Optimisation involves finding them. Begin with starting estimates, and repeatedly change them so that the objective function decreases each time.Begin with starting estimates, and repeatedly change them so that the objective function decreases each time. The “best” parameters are those that minimise this objective function.The “best” parameters are those that minimise this objective function. Optimisation involves finding them.Optimisation involves finding them. Begin with starting estimates, and repeatedly change them so that the objective function decreases each time.Begin with starting estimates, and repeatedly change them so that the objective function decreases each time.

Schematic of optimisation Repeat until convergence... Hold , ,  2 and  constant, and minimise E w.r.t.  - Levenberg-Marquardt strategy, using dE/d  and d 2 E/d  2 Hold , ,  2 and  constant, and minimise E w.r.t.  - Levenberg-Marquardt strategy, using dE/d  and d 2 E/d  2 Hold  and  constant, and minimise E w.r.t. ,  and  2 -Use an Expectation Maximisation (EM) strategy. end Repeat until convergence... Hold , ,  2 and  constant, and minimise E w.r.t.  - Levenberg-Marquardt strategy, using dE/d  and d 2 E/d  2 Hold , ,  2 and  constant, and minimise E w.r.t.  - Levenberg-Marquardt strategy, using dE/d  and d 2 E/d  2 Hold  and  constant, and minimise E w.r.t. ,  and  2 -Use an Expectation Maximisation (EM) strategy. end (Iterated Conditional Mode)

Levenberg-Marquardt Optimisation LM optimisation is used for the nonlinear registration and bias correction components.LM optimisation is used for the nonlinear registration and bias correction components. Requires first and second derivatives of the objective function ( E ).Requires first and second derivatives of the objective function ( E ). Parameters  and  are updated byParameters  and  are updated by Increase to improve stability (at expense of decreasing speed of convergence).Increase to improve stability (at expense of decreasing speed of convergence). LM optimisation is used for the nonlinear registration and bias correction components.LM optimisation is used for the nonlinear registration and bias correction components. Requires first and second derivatives of the objective function ( E ).Requires first and second derivatives of the objective function ( E ). Parameters  and  are updated byParameters  and  are updated by Increase to improve stability (at expense of decreasing speed of convergence).Increase to improve stability (at expense of decreasing speed of convergence).

EM is used to update ,  2 and  For iteration (n), alternate between: –E-step: Estimate belonging probabilities by: –M-step: Set  (n+1) to values that reduce: For iteration (n), alternate between: –E-step: Estimate belonging probabilities by: –M-step: Set  (n+1) to values that reduce:

Bayes rule states: p(q|e)  p(e|q) p(q) – –p(q|e) is the a posteriori probability of parameters q given errors e. – –p(e|q) is the likelihood of observing errors e given parameters q. – –p(q) is the a priori probability of parameters q. Maximum a posteriori (MAP) estimate maximises p(q|e). Maximising p(q|e) is equivalent to minimising the Gibbs potential of the posterior distribution (H(q|e), where H(q|e)  -log p(q|e)). The posterior potential is the sum of the likelihood and prior potentials: H(q|e) = H(e|q) + H(q) + c – –The likelihood potential ( H(e|q)  -log p(e|q )) is based upon the sum of squared difference between the images. – –The prior potential ( H(q)  -log p(q )) penalises unlikely deformations. Bayes rule states: p(q|e)  p(e|q) p(q) – –p(q|e) is the a posteriori probability of parameters q given errors e. – –p(e|q) is the likelihood of observing errors e given parameters q. – –p(q) is the a priori probability of parameters q. Maximum a posteriori (MAP) estimate maximises p(q|e). Maximising p(q|e) is equivalent to minimising the Gibbs potential of the posterior distribution (H(q|e), where H(q|e)  -log p(q|e)). The posterior potential is the sum of the likelihood and prior potentials: H(q|e) = H(e|q) + H(q) + c – –The likelihood potential ( H(e|q)  -log p(e|q )) is based upon the sum of squared difference between the images. – –The prior potential ( H(q)  -log p(q )) penalises unlikely deformations. Bayesian Formulation

Linear Regularisation Some bias fields and distortions are more probable (a priori) than others.Some bias fields and distortions are more probable (a priori) than others. Encoded using Bayes rule:Encoded using Bayes rule: Prior probability distributions can be modelled by a multivariate normal distribution.Prior probability distributions can be modelled by a multivariate normal distribution. –Mean vector a and  b –Covariance matrix a and  b –-log[P(a)] = (a-m a ) T S a -1 (a-m a ) + const Some bias fields and distortions are more probable (a priori) than others.Some bias fields and distortions are more probable (a priori) than others. Encoded using Bayes rule:Encoded using Bayes rule: Prior probability distributions can be modelled by a multivariate normal distribution.Prior probability distributions can be modelled by a multivariate normal distribution. –Mean vector a and  b –Covariance matrix a and  b –-log[P(a)] = (a-m a ) T S a -1 (a-m a ) + const

Voxels are assumed independent!

Hidden Markov Random Field Voxels are NOT independent: GM voxels are surrounded by other GM voxels, at least on one side. Model the intensity and classification of the image voxels by 2 random field: a visible field y for the intensities a hidden field c for the classifications Modify the cost function E: And, at each voxel, the 6 neighbouring voxels are used to to build U mrf, imposing local spatial constraints.

Hidden Markov Random Field T1 imageT2 image

Hidden Markov Random Field White matter T1 & T2: MoG + hmrf T1 only: MoG only

Hidden Markov Random Field Gray matter T1 & T2: MoG + hmrf T1 only: MoG only

Hidden Markov Random Field CSF T1 & T2: MoG + hmrf T1 only: MoG only

PerspectivesPerspectives Multimodal segmentation : 1 image is good but 2 is better ! Model the joint histogram using multi- dimensional normal distributions. Tumour detection : contrasted images to modify the prior images automatic detection of outliers ?

Thank you for your attention !