Cue Integration in Figure/Ground Labeling Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, U.C. Berkeley We present a model of edge and region grouping using a conditional random field built over a scale-invariant representation of images to integrate multiple cues. Our model includes potentials that capture low-level similarity, mid-level curvilinear continuity and high-level object shape. Maximum likelihood parameters for the model are learned from human labeled ground-truth on a large collection of horse images using belief propagation. Using held out test data, we quantify the information gained by incorporating generic mid-level cues and high-level shape.
Conditional Random Field joint model over contours, regions and objects integrate low-, mid- and high-level cues easy to train and test on large datasets Pb CDT Bottom-up grouping Contours Regions, Objects Output Marginals Overview
Constrained Delaunay Triangulation (CDT) Constructing a scale-invariant representation from the bottom-up: 1.Compute low-level edge map 2.Trace contours and recursively split them into piecewise linear segments 3.Use Constrained Delaunay Triangulation to complete gaps and partition the image into dual edges and regions.
Use P human the soft ground-truth label defined on CDT graphs: precision close to 100% Pb averaged over CDT edges: no worse than the original Pb Increase in asymptotic recall rate: completion of gradientless contours CDT edges capture most of the image boundaries
A Random Field for Cue Integration We consider a conditional random field (CRF) on top of the CDT triangulation graph, with a binary random variable X e for each edge in the CDT, a binary variable Y t for every triangle, and a latent node Z which encodes object location. We use a simple linear combination of low-, mid- and high-level cues.
Low-level cues: edge energy (L1) and similarity of brightness/texture (L2). Mid-level cues: contour continuity and junction frequency (M1) and contour/region labeling consistency (M2). High-level cues: familiar texture (H1), object region support (H2) and object shape (H3).
Maximum likelihood CRF parameters are fit via gradient descent. We use loopy belief propagation to perform inference, in particular estimating the marginals of X, Y and Z. Junctions are parameterized by the number of gradient and completed edges. A feature based on angle governs curvilinear continuity for degree 2 junctions. Maximum-likelihood weights for various junction types. Mid-level features
A “shapeme” which captures pairs of vertical edges Z Spatial distribution of the shapeme relative to object center. Average support mask helps group regions with incoherent appearance. Z High-level features
Quantitative Analysis of Cue Integration We train and test our approach on a dataset of 344 grayscale horse images. We evaluate the performance of the grouping algorithm against both contours and regions in the human marked ground-truth. We find that for this dataset with limited pose variation, high-level knowledge greatly boosts grouping performance; nevertheless mid-level cues still play a significant role.
L+M+H > H+L > M+L > L