Associative Hierarchical CRFs for Object Class Image Segmentation International Conference on Computer Vision (ICCV) 2009 L’ubor Ladick’y and Chris Russell Oxford Brookes University Pushmeet Kohli Microsoft Research Cambridge Philip H.S. Torr Oxford Brookes University
Outline Introduction Random Fields for Labelling Problems Hierarchical CRF for Object Segmentation Experiments and Results
Structure
Introduction
Distribution of identical billiard balls Mean Shift Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls
Distribution of identical billiard balls Mean Shift Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls
Distribution of identical billiard balls Mean Shift Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls
Distribution of identical billiard balls Mean Shift Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls
Distribution of identical billiard balls Mean Shift Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls
Distribution of identical billiard balls Mean Shift Region of interest Center of mass Mean Shift vector Objective : Find the densest region Distribution of identical billiard balls
Distribution of identical billiard balls Mean Shift Region of interest Center of mass Objective : Find the densest region Distribution of identical billiard balls
Pixel V.S. Segment Based on pixels: Based on segments The difference of pair CRFs between based on pixels and segments Based on pixels: No quantization errors Lack of long range interactions Results oversmoothed Based on segments Allows long range interactions Can not recover from incorrect segmentation
Introduction Propose a novel hierarchical CRF Integration of features derived for different quantisation levels Propose new sophisticated potentials defined over the different levels of the quantisation hierarchy Use a novel formulation that allows context to be incorporate at multiple levels of multiple quantisation
Random Fields for Labelling Problems Introduce the pixel-based CRF used for formulation the object class segmentation problem One discrete R.V. per image pixel, each of which may take a value from the set of labels Symbols: Label: R.V. : Pixel The set of all neighbours of the variable : A clique c is a set of random variables Labelling: denoted by take the value from
CRFs D: the set of the data, Z: the partition function, C: the set of all cliques : the potential function of the clique , The energy form: The most probable or MAP labeling : Wrote as the sum of unary and pairwise potentials
The Robust 𝑃 𝑁 Model Extended by the robust 𝑃 𝑁 potentials [KohLi et al., 2008 ] S: the set of the segments The pixels within the same segment are more like likely to take the same label the form of the robust 𝑃 𝑁 potentials: ; The weighted version:
𝑃 𝑁 -Based Hierarchical CRFs The single auxiliary variable 𝑦 𝑐 where c is a segment or a clique Take the value from an extended label set
𝑃 𝑁 -Based Hierarchical CRFs New cost function over The unary potential over Y, 𝜓 𝑐 ( 𝑦 𝑐 )= 𝛾 𝑐 𝓁 , 𝑦 𝑐 ∈ℒ & 𝛾 𝑐 𝑚𝑎𝑥 , 𝑦 𝑐 = 𝐿 𝐹 The pairwise potential over Y and X 𝜓 𝑐 ( 𝑦 𝑐 , 𝑥 𝑖 ) = 0 , 𝑦 𝑐 = 𝐿 𝐹 𝑜𝑟 𝑦 𝑐 = 𝑥 𝑖 & 𝑤 𝑖 𝑘 𝑐 𝑙 ,𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑤ℎ𝑒𝑟𝑒 𝑙= 𝑥 𝑖 Goal: The new energy function: 𝐸 𝒙 = 𝑖∈𝒗 𝜓 𝑖 ( 𝑥 𝑖 ) + 𝑖∈𝒗,𝑗∈ 𝑵 𝑖 𝜓 𝑖𝑗 ( 𝑥 𝑖 , 𝑥 𝑗 ) +min 𝑦 ( 𝑐∈𝑺 𝜓 𝑐 𝑝 𝑥, 𝑦 𝑐 + 𝑐,𝑑∈𝑺 𝜓 𝑐,𝑑 𝑦 𝑐 , 𝑦 𝑑 )
Recursive Form The auxiliary variables in the last layer are the input variable The new energy function: 𝐸 𝒙 = 𝑖∈𝒗 𝜓 𝑖 ( 𝑥 𝑖 ) + 𝑖∈𝒗,𝑗∈ 𝑵 𝑖 𝜓 𝑖𝑗 ( 𝑥 𝑖 , 𝑥 𝑗 ) +min 𝑦 ( 𝑐∈𝑺 𝜓 𝑐 𝑝 𝑥, 𝑦 𝑐 + 𝑐,𝑑∈𝑺 𝜓 𝑐,𝑑 𝑦 𝑐 , 𝑦 𝑑 ) The recursive form: 𝐸 𝑛 𝑥 𝑛−1 , 𝑥 𝑛 = 𝑐∈ 𝑆 (𝑛) 𝜓 𝑐 𝑝 ( 𝒙 𝑐 𝑛+1 , 𝑥 𝑐 𝑛 ) + 𝑐𝑑∈ 𝑁 (𝑛) 𝜓 𝑐𝑑 𝑥 𝑐 𝑛 , 𝑥 𝑑 𝑛 + min 𝑥 (𝑛+1) 𝐸 𝑛+1 ( 𝒙 𝑛 , 𝒙 𝑛+1 ) Initial form: 𝐸 0 𝒙 = 𝑖∈ 𝑆 (0) 𝜓 𝑖 ( 𝑥 𝑖 0 ) + 𝑖𝑗∈ 𝑁 (0) 𝜓 𝑖𝑗 𝑥 𝑖 𝑜 , 𝑥 𝑗 𝑜 + min 𝑥 (1) 𝐸 1 ( 𝒙 0 , 𝒙 1 )
Hierarchical CRF for Object Segmentation Describe the set of potentials in the object-class segmentation problem Include unary potentials for both pixels and segments, pairwise potentials between pixels and segments and connective potentials between pixels and their containing segments
Robustness to misleading segmentations Using unsupervised segmentation algorithm may be misleading – segment may contain multiple object classed Assigning the same label to all pixels will result in an incorrect labeling Overcome it by using the segment quality measures [Rabinovich et at., 2009] and [Ren and Malik, 2003] By modifying the potentials according to a quality sensitive measure for all segment c Writing 𝜆 𝑐 : weight 𝜉 𝑐 : features based potential over c
Potentials for object class segmentation Refer to elements of each layer as pixels, segments, and super- segments At the pixel level: The unary potentials are computed using a boosted dense feature classifier [Shotton et al., 2006] The pairwise potentials [Boykov and Jolly, 2001] , [Rother at al., 2004]:
Potentials for object class segmentation At the segment level: Initially found using a fine scale mean-shift algorithm [Comaniciu and Meer, 2002] Contain little novel local information, but strong predictors of consistency The potentials learning at this level are uniform, due to the lack of unique features, however as they are strongly indicative of local consistency, the penalty associated with breaking them is high To encourage neighbouring segments with similar texture to take the same label, used pairwise potentials based on the Euclidean distance if normalized histograms of colour
Potentials for object class segmentation At the super-segment level: Based upon a coarse mean-shift segmentation, performed over the result of the previous segmentations Contain significantly more internal information than their smaller children Propose unary segment potential based on the histograms of features
Unary Potentials From Dense Feature Perform texture based segmentation at pixel level Derived from TextonBoost [Shotton et al., 2006] The features are computed on every pixel Extend the TextonBoost by boosting classifiers defined on multiple dense feature together Dense–feature shape filters defined by triplets: [f, t, r] where f is a feature type, t is a feature cluster, and r is a rectangular region Feature response : Given a point i, the number of features of type f belong to the cluster t in the region r relative to the point i
Histogram-Based Segment Unary Potentials Defined over segments and super-segments The distribution of dense features responses are more discriminative than any feature along The unary potential of an auxiliary variable representing a segment is learnt by (using the normalized histograms of multiple clustered dense features) using multi-class Gentle Ada-boost[Torralba et al., 2004] Weak classifiers: f: the normalized histogram of the feature set t: the cluster index a: threshold
Histogram-Based Segment Unary Potentials The segment potential: a : the response given by the Ada-boost classifier to clique c taking label l a : the truncation threshold a , and a normalizing constant
Learning Weights for Hierarchical CRFs Uses a coarse to fine, layer-based, local search scheme over a validation set Introduce additional notation: : the variable contained in the layer : the labelling of associated with a MAP estimate Determine a dominant label for each segment c, such that when , if there is no such dominant label, set a ⟹ 𝐿 𝑐 = 𝑙 , 𝑖∈𝑙 Δ( 𝑥 𝑖 =𝑙)≥0.5 𝑐 & 𝐿 𝐹 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 The label of a clique : correspond to the dominant label of this clique(segment) in the ground truth (or ) for its containing ot be correctly labelled
Learning Weights for Hierarchical CRFs At each layer, seek to minimize the discrepancy between the dominant ground truth of a clique(segment), and the value of the MAP estimate Choose parameters λ to minimize
Algorithm : the weighting of unary terms in the layer a : the weighting of pairwise terms in the layer a : a scalar modifier of all terms in the layer a : an arbitrary constant that controls the precision of the final assignment of
Experiments Two data sets MSRC-21 [Shotton et al., 2006] Resolution: 320×213 pixels 21 object classes PASCAL VOC 2008 [Everingham et al., 2008, website] 511 training, 512 validation and 512 segmented test images 20 foreground and 1 background classes 10, 057 images for which only the bounding boxes of the objects present in the image are marked
Results on The MSRC-21
Results on The MSRC-21 [25]: J. Shotton et al., CVPR, 2008 [26]: J. Shotton et al., ECCV, 2006 [1]: D. Batra et al., CVPR, 2008 [25]: L. Yang et al., CVPR, 2007
Results on The MSRC-21 [25]: J. Shotton et al., CVPR, 2008 [26]: J. Shotton et al., ECCV, 2006 [1]: D. Batra et al., CVPR, 2008 [25]: L. Yang et al., CVPR, 2007
Results on The VOC-2008
Results on The VOC-2008