Dynamical Statistical Shape Priors for Level Set Based Tracking Dheeraj Singaraju Reading group: 07/27/06
Segmentation using shape priors In the past, level set based segmentation methods have incorporated statistical shape knowledge Statistically learned shape information can deal with missing information or misleading information due to noise, clutter or occlusion Such segmentation frameworks can be applied to tracking objects across frames
The need for dynamic shape priors All the previous approaches in tracking assume statistical shape priors that are static in time In silhouette data corresponding to a walking person, certain poses become less / more likely with time We need to exploit the temporal coherence in the poses of a human during the course of an entire action
Aim of the paper This paper deals with learning dynamical statistical shape models for implicitly represented shapes to track human motion Such an approach segments an image based on its intensities and the segmentations obtained in the previous image frames The resulting approach is expected to perform better in the presence of noise and clutter than the previous mentioned approaches, due to the temporal shape consistency being enforced.
Features of the proposed algorithm Implicit shape representation: The method can deal with shapes of varying topology Intensity based segmentation: The tracking scheme is region based as opposed to being edge based Optimization using gradient descent: Facilitates an extension to data in higher dimension
Implicit representation of hyperspaces The level set method helps propagate hyperspaces in a domain by evolving an appropriate embedding function , where Advantages of such a representation The boundary representation does not depend on any parameterization Topological changes such as merging and splitting can be dealt with One can generalize the framework easily to a higher dimensional data
Defining terms to work with Shape( ): A set of closed 2D contours modulo a certain transformation Transformations( ): They are essentially problem dependent and can be rigid body transformations, similarity or affine transforms, etc The object of interest is thus We want to model the temporal evolution of the shapes separately from that of the transformations
The Bayesian inference problem Assume we are given consecutive images from an image sequence where denotes the set of images We then want to maximize the following conditional probability Does not depend on the estimated quantities temporal consistency goodness of segmentation
Assumptions to make life easier The images are assumed to be mutually independent The intensities of the shape and the background are modeled as independent samples from two Gaussian distributions with unknown means and unknown variances
Evaluating the Gaussian models We use the Heaviside step function to denote the areas where is positive or negative
Simplification of distribution To avoid computational burden, it is assumed that the distribution of previous states of and to be strongly peaked around the maxima of the respective distribution If we assume that the tracking system does not stored the previous images but only the past estimates of shape and transformation, then the inference problem reduces to
Distributions for temporal evolutions We now need to evaluate We break down the analysis into two cases We assume that the shape and transformation are mutually independent and assume a uniform prior on the transformation parameters We consider the joint distribution of the shape and transformation parameters
Shapes and eigenmodes It is known that statistical models can be estimated more reliably if the dimensionality of the model and data are low The Bayesian inference problem is now formulated in low dimensions within the subspace spanned by the largest principal eigenmodes of a set of a set of sample shapes The training sequence is therefore used for the following Extraction of eigenmodes of the sample shapes Learning of dynamical models for the low dimensional representation of the implicit shapes
Projection into the space of shapes If is a temporal sequence of training shapes. We denote the mean shape as and the n most significant eigenmodes as Given the mean and n most significant eigenmodes, we can represent any arbitrary shape by a shape vector as follows
How effective is the projection ? Actual walking sequence Approximated walking sequence using 6 eigenmodes
Projections instead of the actual shape Now we work with the parameters which are believed to be synonymous with the segmentations We therefore need to maximize the following conditional probability Contribution of this paper
Temporal evolution of shapes The paper proposes to learn the temporal dynamics by modeling the shape vectors by a Markov chain of order k The probability of a shape conditioned on the shapes at the previous times is then modeled as DOUBT Dependent on time ?
Effectiveness of predicting evolution The paper considers 151 frames of a training sequence and estimates the parameters of a second order autoregressive model These model parameters can then be used to synthesize walking sequences
Synthesis of walking sequence The synthesized sequences captures most of the characteristic motion of the walking person Discrepancies in the shapes are due to a great reduction in the number of parameters used to describe the motion
Coupling the shape and transformation parameters In general, one can expect the deformation parameters and the transformation parameters to be coupled We want to learn dynamical models that are invariant to rotation, translation or any other desired transformations Instead of the absolute transformation, we consider the incremental transformation between frames We deal with a new vector as rather than a shape vector
Optimization using gradient descent Given an image and a set of previously segmented shape parameters and transformation parameters the goal is to maximize the conditional probability with respect to This goal can be achieved by minimizing the following energy function Relative weighting between the shape prior and the data term
A close look at the energy function The data term can be written as The shape term can be written as
Updating the shape Update equation with respect to the shape parameters separation of image intensities draws the shape towards a prior
Updating the transformation parms Update equation with respect to the transformation parms draws the shape towards the most likely transformation draws the shape towards a prior
Noisy images of a man walking Results Noisy images of a man walking Segmentation using static shape priors, 25%noise Segmentation using static shape priors, 50%noise
Results (contd.) Segmentation using dynamic shape priors, 50% noise
Segmentation using dynamic shape priors, 75% noise Results (contd.) Segmentation using dynamic shape priors, 75% noise
Results (contd.)
Results (contd.) Comparison of algorithm with ground truth
Results (contd.) Invariance to speed To increase speed – removed frames To decrease speed – replicated and inserted frames