Learning a Region-based Scene Segmentation Model F O R D Learning a Region-based Scene Segmentation Model M. Pawan Kumar Daphne Koller
Aim To learn an accurate scene segmentation model Divide image into non-overlapping regions Assign each region to a semantic class Features extracted from each region Spatial prior
Why Regions? Black or Brown? Shape?
Why Regions? Black or Brown? Shape?
Why Regions? Two concentric circles (shape) Inner circle is metallic (texture) Outer circle is black (color)
Which Regions? Bottom-up Over-segmentation Mean-Shift, N-cuts Too small to capture useful cues Not faithful to boundaries between scene entities
Which Regions? Choose regions according to a global energy function Regions that give the best accuracy
Outline Region-based Segmentation Model Learning from Coarse Labels Inference Results
Region-based Segmentation Model G = (V(fP),E(fP)) fR: Regions Classes fP: Pixels Regions Unknown Number Gould et al., 2009
Region-based Segmentation Model E(f) = ∑w1(fR(r))T r(fP=r) + ∑w2T rs(fP=r,fP=s) Region Features Region Pairwise Features Boundary Contrast Shape, Texture, Color Internal Pixels Contrast Fraction of pixels above/below horizon w : Model parameters to be learnt Gould et al., 2009
Outline Region-based Segmentation Model Learning from Coarse Labels Inference Results
Ground Truth Labeling NO Amazon’s Mechanical Turk Are they the best regions? Car Top-half (transparent) Bottom-half (solid) Wheels (circular) Tree Crown (leafy) Trunk (brown)
Ground Truth Labeling Desired ground truth f* Coarser version fC
Learning with Coarse Labels Refined labeling fR faithful to coarse label fC ∑ iwTi* – log Zi(w) HUGE summation!! Marginalize over fR?? Very difficult!! MAP is bread-and-butter of Vision An accurate MAP estimation algorithm !!!
Learning with Coarse Labels Use max-margin learning framework At each iteration: Complete the ground truth Find the most violated constraint Approximate MAP Inference
Outline Region-based Segmentation Model Learning from Coarse Labels Inference Results
Region Selection Image Dictionary of Regions Human Annotation
Region Selection Image Dictionary of Regions Each super-pixel covered by exactly one selected region Super-Pixels
Integer Program miny ∑ r(i)yr(i) + ∑ rs(i,j)yrs(i,j) Binary yr(0) = 1 iff r is not selected Binary yr(1) = 1 iff r is selected miny ∑ r(i)yr(i) + ∑ rs(i,j)yrs(i,j) Minimize the energy s.t. yr(0) + yr(1) = 1 Assign one label to r from L yrs(i,0) + yrs(i,1) = yr(i) Ensure yrs(i,j) = yr(i)ys(j) yrs(0,j) + yrs(1,j) = ys(j) ∑r “covers” u yr(1) = 1 Each super-pixel is covered by exactly one selected region yr(i), yrs(i,j) {0,1} Binary variables
Linear Program miny ∑ r(i)yr(i) + ∑ rs(i,j)yrs(i,j) Binary yr(0) = 1 iff r is not selected Binary yr(1) = 1 iff r is selected miny ∑ r(i)yr(i) + ∑ rs(i,j)yrs(i,j) s.t. yr(0) + yr(1) = 1 yrs(i,0) + yrs(i,1) = yr(i) yrs(0,j) + yrs(1,j) = ys(j) ∑r “covers” u yr(1) = 1 yr(i), yrs(i,j) [0,1]
Linear Program Clique of overlapping and neighboring regions (non-submodular potentials) Mutual exclusivity Covering constraint Computationally expensive? Efficient Dual Decomposition
The Learning Approach At each iteration Given (i) fC (coarse labeling) (ii) current set of parameters Find fR (refined labeling) Region Partitioning Problem Most violated constraint Region Partitioning + Label assignment
Outline Region-based Segmentation Model Learning from Coarse Labels Inference Results
Stanford Background Dataset 715 outdoor scenes 7 background + 1 foreground Amazon’s Mechanical Turk Available for download: http://www.stanford.edu/~sgould
Completing the Ground Truth
Completing the Ground Truth
Completing the Ground Truth
Pixel-wise Accuracy Pixel-based Model: 67.71% Human Labeled Regions: 64.85% Our Approach: 72.49%
Examples
Examples
Examples
Summary Coarse labels are easy to obtain Refined labels are easy to train with Marginalization is not possible Approximate MAP (required for testing) is used to complete the labeling
Future Work Convergence analysis Region features + context Learning with Different Labelings
http://dags.stanford.edu Questions?