Efficient Inference for Fully-Connected CRFs with Stationarity Yimeng Zhang, Tsuhan Chen CVPR 2012
Summary Explore object-class segmentation with fully-connected CRF models Only restriction on pairwise terms is `spatial stationarity’ (i.e. depend on relative locations) Show how efficient inference can be achieved by Using a QP formulation Using FFT to calculate gradients in complexity (linear in) O(NlogN)
Fully-connected CRF model General pairwise CRF model: Image I Class labeling, X: Label set, L: V = set of pixels, N_i = neighbourhood of pixel i, Z(I) = partition function, psi = potential functions
Fully-connected CRF model General pairwise CRF model: In fully-connected CRF, for all i, N_i = V
Unary Potential Unary potential generates a score for each object class per pixel (TextonBoost)
Pairwise Potential Pairwise potential measures compatibility of the labels at each pair of pixels Combines spatial and colour contrast factors
Pairwise Potential Colour contrast: Spatial term:
Pairwise Potential Learning the spatial term
MAP inference using QP relaxation Introduce a binary indicator variable for each pixel and label MAP inference expressed as a quadratic integer program, and relaxed to give the QP
MAP inference using QP relaxation QP relaxation has been proved to be tight in all cases (Ravikumar ICML 2006 [24]) Moreover, it is convex whenever matrix of edge-weights is negative-definite Additive bound for non-convex case QP requires O(KN) variables, LP requires (K^2E)
MAP inference using QP relaxation Gradient Derive fixed-point update by forming Lagrangian and setting its derivative to 0
Illustration of QP updates
Efficiently evaluating the gradient Required summation Would be a convolution without the color term With color term is requires 5D-filtering Can be approximated by clustering into C color clusters, => C convolutions across
Efficiently evaluating the gradient Hence, for the case x_i = x_j, we need to evaluate Instead, evaluate for C clusters (C = 10 to 15) where Finally, interpolate
Update complexity FFTs of each spatial filters can be calculated in advance (K^2 filters) At each update, we require C FFTs calculating, O(CNlogN) K^2 convolutions are needed, each requiring a multiplication, O(K^2CN) Terms can be added in Fourier domain, => only KC inverse FFTs needed, O(KCNlogN) Run-time per iteration < 0.1s for 213x320 pixels (+ downsampling by factor of 5)
MSRC synthetic experiment Unary terms randomized Spatial distributions set to ground-truth
MSRC synthetic experiment Running times
Sowerby synthetic experiment
MSRC full experiment Use TextonBoost unary potentials Compare with several other CRFs with same unaries Grid only Grid + P^N (Kohli, CVPR 2008) Grid + P^N + Cooccurrence (Ladickỳ, ECCV 2010) Fully-connected + Gaussian spatial (Krähenbühl, NIPS 2011)
MSRC full experiment Qualitative comparison
MSRC full experiment Quantitative comparison Overall Per-class Timing: 2-8s per image