Vibhav Vineet, Jonathan Warrell, Paul Sturgess, Philip H.S. Torr

Name: Vibhav Vineet, Jonathan Warrell, Paul Sturgess, Philip H.S. Torr
Uploaded: 2017-07-27T23:27:02+00:00
Duration: PTM24S47
Channel: Samantha Wilson
Description: Vibhav Vineet, Jonathan Warrell, Paul Sturgess, Philip H.S. Torr

Vibhav Vineet, Jonathan Warrell, Paul Sturgess, Philip H.S. Torr
Improved Initialisation and Gaussian Mixture Pairwise Terms for Dense Random Fields with Mean-field Inference Vibhav Vineet, Jonathan Warrell, Paul Sturgess, Philip H.S. Torr

Labelling Problem Assign a label to each image pixel
Many problems in computer vision are modelled as labelling problems as object segmentation, stereo, object detection. Object segmentation Stereo Object detection

Problem Formulation Find a labelling that maximizes the conditional probability or minimizes the energy function They involve finding a labelling that maximizes the conditional probability or minizing the energy function given image.

Problem Formulation Grid CRF leads to over smoothing around boundaries
Inference Grid CRF construction Energy functions are generally defined over gird crf where connectivity is limited to either 4 or 8 neighbours, however this leads to over-smoothing around boundaries.

Dense CRF construction
Problem Formulation Grid CRF leads to over smoothing around boundaries Dense CRF is able to recover fine boundaries Inference Grid CRF construction However an alternated and more powerful strategy is to defined a dense connectivity which is able to capture the long range interaction among variables. This is able to recover fine boundaries and output looks quite natural. Inference Dense CRF construction

Inference in Dense CRF Very high time complexity
graph-cuts based methods not feasible alpha-expansion takes almost 1200 secs/per image with neighbourhood size of 15 on PascalVOC segmentation dataset However as the time complexity of algorithm increases with the dense connectivity. For example, graph cuts based alpha-expansion takes almost 1200 secs/image with large neighbourhood size making it infeasible to work with.

Inference in Dense CRF Filter-based mean-field inference method takes 0.2 secs* Efficient inference under two assumptions Mean-field approximation to CRF Pairwise weights take Gaussian weights Recently a highly efficient method is proposed by Krahenbuhl et.al. performs inference on such problems within milliseconds, without loosing any accuracy. They have two assumptions in their framework. First they take mean-field approximation to crf, and their pairwise terms take Gaussian weights. *Krahenbuhl et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 11

Efficient inference in dense CRF
Mean-fields methods (Jordan et.al., 1999) Intractable inference with distribution P Approximate distribution from tractable family In mean-field method we take a tractable family of distribution, and try to find a member Q from this family which is closest to P.

Naïve mean field Assume all variables are independent
One family of distribution is when we assume all variables to be independent.

Efficient inference in dense CRF
Assume Gaussian pairwise weight Mixture of Gaussian kernels Their second assumption is regarding the kind of pairwise terms they use. Their pairwise terms take a linear combination of Gaussians. Specifically they use bilateral and spatial kernels. Spatial Bilateral

Marginal update Marginal update involve expectation of cost over distribution Q given that x_i takes label l Expensive message passing step is solved using highly efficient permutohedral lattice based filtering approach With these two assumption their marginal update involves evaluating the expectation of unary and pairwise cost under Q distribution. Here expensive message passing step from all j variables to i variable is computed using fast bilateral filtering approach. Finally they find the maximum posterior marginal by taking the label that maximizes the marginal for that variable. Maximum posterior marginal (MPM) with approximate distribution:

Q distribution Q distribution for different classes across different iterations on CamVID dataset Iteration 0 Marginal update is an iterative process, each iteration leads to better approximation. Here we show how marginals of some the classes evolve across iterations on camvid dataset. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Q distribution Q distribution for different classes across different iterations on CamVID dataset Iter 0 Iter 1 Marginal update is an iterative process, each iteration leads to better approximation. Here we show how marginals of some the classes evolve across iterations on camvid dataset. Iter 2 Iter 10

Two issues associated with the method
Sensitive to initialisation Restrictive Gaussian pairwise weights However there are two issues associated with their method. First their method is sensitive to initialisation, and they Gaussian components are restricted to take only zero mean.

Our Contributions Resolve two issues associated with the method
Sensitive to initialisation Propose SIFT-flow based initialisation method Restrictive Gaussian pairwise weights Expectation maximisation (EM) based strategy to learn more general Gaussian mixture model In this work, we try to resolve both these issues. First we propose a SIFT-flow based initialisation strategy to provide a better initialisation. Second we propose an Expectation Maximisation based method for learning more general class of Gaussian mixture model.

Sensitivity to initialisation
Experiment on PascalVOC-10 segmentation dataset Mean-field Alpha-expansion Unary potential 28.52 % 27.88% Ground truth label 41 % Observe an improvement of almost 13% in I/U score on initialising the mean-field inference with the ground truth labelling To highlight the importance of better initialisation, we conducted an experiment on pascal voc segmentation dataset. We observed an improvement of almost 13% in accuracy on initialisation with the ground truth labels compared to when we initialised with unary potential. Though alpha-expansion is quite robust to initialisation. Thus good initialisation can lead to better solution. We propose a SIFT-flow based better initialisation method. Good initialisation can lead to better solution Propose a SIFT-flow based better initialisation method

SIFT-flow based correspondence
Given a test image, we first retrieve a set of nearest neighbours from training set using GIST features Given a test image, we first retrieve a set of nearest neighbours from the training set using GIST features. Test image Nearest neighbours retrieved from training set

K-nearest neighbours warped to the test image 23.31 13.31 14.31 18.38 Now we warp those nearest neighbours to the current test image, and get their corresponding flows. 22 22 Test image 22 30.87 27.2 Warped nearest neighbours and corresponding flows

Pick the best nearest neighbour based on the flow value Test image Nearest neighbour Then we pick the best nearest neighbour based on the flow value. Warped image Flow: 13.31

Label transfer Warp the ground truth according to correspondence Transfer labels from top 1 using flow Ground truth of test image Ground truth of the best nearest neighbour Then we take the ground truth of the closest nearest neighbor and warp according to the flow. Here we can see the warped ground truth looks quite similar to the actual ground truth of the test image. This forms the basis for our initialisation. Flow Warped ground truth according to flow

SIFT-flow based initialisation
Rescore the unary potential s rescores the unary potential of a variable based on the label observed after the label transfer stage set through cross-validation First we reweight the unary potential of the pixel based on the label transferred using SIFT-flow. We can see it boosts the results a lot. With this we are able to recover building parts properly. Test image Ground truth Without rescoring After rescoring Qualitative improvement in accuracy after using rescored unary potential

SIFT-flow based initialisation
Initialise mean-field solution Further now we use this to initialise our mean-field as well both using better unary terms, and using pairwise weights. We see some qualitative improvement with as well. Test image Ground truth Without initialisation With initialisation Qualitative improvement in accuracy after initialisation of mean-field

Gaussian pairwise weights
Experiment on PascalVOC-10 segmentation dataset Plotted the distribution of class-class ( ) interaction by selecting pair of random points (i-j) The second issue relates to the Gaussian pairwise weights. Here we first conduct an experiment to highlight complexity of the distribution of the data. The figure shows aeroplane-aeroplane horse-person car-person distribution on pascalVOC dataset. Feature vector includes differences of x’s and y’s of a pair of pixels i-j. We randomly sample some of these pair of points and plot them. Aeroplane-Aeroplane Horse-Person Car-Person

Gaussian pairwise weights
Experiment on PascalVOC-10 segmentation dataset Such complex structure of data can not be captured by zero mean Gaussian distributed horizontally not centred around zero mean distributed vertically We can see the most of the aeroplan-aeroplane data are horizontally distributed, horse-person are distributed vertically and car-person are not certed around zero mean. Such complexity of the data distribution can not be captured by zero mean Gaussian. Thus we propose an EM based learning strategy to incorporate more general class of Gaussian mixture model. Propose an EM-based learning strategy to incorporate more general class of Gaussian mixture model

Our model Our energy function takes following form:
We use separate weights for label pairs but Gaussian components are shared With this our energy function takes following form, where we add new pairwise weights to the original form of energy function. We have a Gaussian mixture with M components which are shared across label pairs but we use separate weights for label pairs x_i, x_j. Also the zero-mean Gaussians can be in principle subsumed by our learnt mixture model, but we found it better to keep both the mixture models. We follow a piecewise learning strategy to learn the parameters of the our energy function We follow piecewise learning strategy to learn parameters of our energy function

Learning mixture model
Learn the parameters similar to this model* First we learn the parameters of *Krahenbuhl et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 11

Learn the parameters similar to this model* Learn the parameters of the Gaussian mixture mean, standard deviation mixing coefficients *Krahenbuhl et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 11

Learn the parameters similar to this model* Learn the parameters of the Gaussian mixture mean, standard deviation mixing coefficients Lambda is set through cross validation *Krahenbuhl et al. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials, NIPS 11

Our model We follow a generative training model
Maximise joint likelihood of pair of labels and features: : latent variable: cluster assignment We follow expectation maximization (EM) based method to maximize the likelihood function

Our model is able to capture the true distribution of class-class interaction Finally we show some of the models learnt. We can see the models learnt capture the distribution of the data. Aeroplane-Aeroplane Horse-Person Car-Person

Inference with mixture model
Involves evaluating M extra Gaussian terms: Perform blurring on mean-shifted points Increases time complexity

Without initialisation
Experiments on Camvid Q distribution for building classes on CamVID dataset Iteration 0 Now we show the Q distribution for buildign class on CamVid dataset before and after initialisation. We can see the how the confidence of building pixels increases with initialisation and keeps on getting better across iterations. Ground truth Without initialisation With initialisation Confidence of building pixels increases with initialisation 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Experiments on Camvid Q distribution for building classes on CamVID dataset Iteration 1 Ground truth Without initialisation With initialisation Confidence of building pixels increases with initialisation 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Experiments on Camvid Image 2 Ground truth Qualitatively we get a lot better results which resembles the ground truth labels. Without Initialisation With Initialisation Building is properly recovered with our initialisation strategy

Experiments on Camvid Quantitative results on Camvid dataset
Algorithm Time(s) Overall(%-corr) Av. Recall Av. U/I Alpha-exp 0.96 78.84 58.64 43.89 APST(U+P+H) 1.6 85.18 60.06 50.62 denseCRF 0.2 79.96 59.29 45.18 Ours (U+P+I) 0.35 85.31 59.75 50.56 Our model with unary and pairwise terms achieve better accuracy than other complex models Generally achieve very high efficiency compared to other methods

Experiments on Camvid Qualitative results on Camvid dataset
Alpha-expansion Image Ground truth Ours Able to recover building and tree properly

Experiments on PascalVOC-10
Qualitative results of SIFT-flow method Image Ground truth Warped nearest ground truth image Output without SIFT-flow Output with SIFT-flow Able to recover missing body parts

Quantitative results PascalVOC-10 segmentation dataset Algorithm Time(s) Overall(%-corr) Av. Recall Av. U/I Alpha-exp 3.0 79.52 36.08 27.88 AHCRF+Cooc 36 81.43 38.01 30.9 Dense CRF 0.67 71.63 34.53 28.4 Ours1(U+P+GM) 26.7 80.23 36.41 28.73 Ours2 (U+P+I) 0.90 79.65 41.84 30.95 Ours3 (U+P+I+GM) 78.96 44.05 31.48 Our model with unary and pairwise terms achieves better accuracy than other complex models Generally achieves very high efficiency compared to other methods

Qualitative results on PascalVOC-10 segmentation dataset Image Ground truth alpha-expansion Dense CRF Ours Able to recover missing object and body parts

Conclusion Filter-based mean-field inference promises high efficiency and accuracy Proposed methods to robustify basic mean-field method SIFT-flow based method for better initialisation EM based algorithm for learning general Gaussian mixture model More complex higher order models can be incorporated into pairwise model

Thank you 

Vibhav Vineet, Jonathan Warrell, Paul Sturgess, Philip H.S. Torr

Similar presentations

Presentation on theme: "Vibhav Vineet, Jonathan Warrell, Paul Sturgess, Philip H.S. Torr"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Vibhav Vineet, Jonathan Warrell, Paul Sturgess, Philip H.S. Torr

Similar presentations

Presentation on theme: "Vibhav Vineet, Jonathan Warrell, Paul Sturgess, Philip H.S. Torr"— Presentation transcript:

Similar presentations

About project

Feedback