Berkeley Vision GroupNIPS Vancouver Learning to Detect Natural Image Boundaries Using Local Brightness, Color, and Texture Cues David Martin, Charless Fowlkes, Jitendra Malik UC Berkeley Vision Group
UC Berkeley Vision GroupNIPS Vancouver 20022
Berkeley Vision GroupNIPS Vancouver Multiple Cues for Grouping Many cues for perceptual grouping: –Low-Level: brightness, color, texture, depth, motion –Mid-Level: continuity, closure, convexity, symmetry, … –High-Level: familiar objects and configurations This talk: Learn local cue combination rule from data
Berkeley Vision GroupNIPS Vancouver Non-BoundariesBoundaries IT BC
Berkeley Vision GroupNIPS Vancouver Goal and Outline Goal: Model the posterior probability of a boundary P b (x,y, ) at each pixel and orientation using local cues. Method: Supervised learning using dataset of 12,000 segmentations of 1,000 images by 30 subjects. Outline of Talk: 1.3 cues: brightness, color, texture 2.Cue calibration 3.Cue combination 4.Compare with other approaches –Canny 1986, Konishi/Yuille/Coughlan/Zhu P b images
Berkeley Vision GroupNIPS Vancouver Brightness and Color Features 1976 CIE L*a*b* colorspace Brightness Gradient BG(x,y,r, ) – 2 difference in L* distribution Color Gradient CG(x,y,r, ) – 2 difference in a* and b* distributions r (x,y)
Berkeley Vision GroupNIPS Vancouver Texture Feature Texture Gradient TG(x,y,r, ) – 2 difference of texton histograms –Textons are vector-quantized filter outputs Texton Map
Berkeley Vision GroupNIPS Vancouver Cue Calibration All free parameters optimized on training data Brightness Gradient –Scale, bin/kernel sizes for KDE Color Gradient –Scale, bin/kernel sizes for KDE, joint vs. marginals Texture Gradient –Filter bank: scale, multiscale? –Histogram comparison: L 1, L 2, L , 2, EMD –Number of textons –Image-specific vs. universal textons Localization parameters for each cue (see paper)
Berkeley Vision GroupNIPS Vancouver Classifiers for Cue Combination Classification Trees –Top-down splits to maximize entropy, error bounded Density Estimation –Adaptive bins using k-means Logistic Regression, 3 variants –Linear and quadratic terms –Confidence-rated generalization of AdaBoost (Schapire&Singer) Hierarchical Mixtures of Experts (Jordan&Jacobs) –Up to 8 experts, initialized top-down, fit with EM Support Vector Machines ( libsvm, Chang&Lin) Range over bias/variance, parametric/non-parametric, simple/complex
Berkeley Vision GroupNIPS Vancouver Classifier Comparison
Berkeley Vision GroupNIPS Vancouver Cue Combinations
Berkeley Vision GroupNIPS Vancouver Alternate Approaches Canny Detector –Canny 1986 –MATLAB implementation –With and without hysteresis Second Moment Matrix –Nitzberg/Mumford/Shiota 1993 –cf. Förstner and Harris corner detectors –Used by Konishi et al in learning framework –Logistic model trained on eigenspectrum
Berkeley Vision GroupNIPS Vancouver Two Decades of Boundary Detection
Berkeley Vision GroupNIPS Vancouver P b Images I Canny2MMUsHumanImage
Berkeley Vision GroupNIPS Vancouver P b Images II Canny2MMUsHumanImage
Berkeley Vision GroupNIPS Vancouver P b Images III Canny2MMUsHumanImage
Berkeley Vision GroupNIPS Vancouver Summary and Conclusion 1.A simple linear model is sufficient for cue combination –All cues weighted approximately equally in logistic 2.Proper texture edge model is not optional for complex natural images –Texture suppression is not sufficient! 3.Significant improvement over state-of-the-art in boundary detection –P b useful for higher-level processing 4.Empirical approach critical for both cue calibration and cue combination Segmentation data and P b images on the web