LOCUS (Learning Object Classes with Unsupervised Segmentation) A variational approach to learning model- based segmentation. John Winn Microsoft Research Cambridge with Nebojsa Jojic, MSR Redmond 7 th July 2006
Overview Learning object models The LOCUS model Experiments & results Extensions to LOCUS
Goal Long Term Goal Recognise ~10,000 object classes.
Learning from ‘buckets’ of images Horse model Learning algorithm Object Segmentation Object Recognition Object Detection
Object segmentation + Horse model LOCUS
Related work
Constellation models Weakly supervised Probabilistic framework Sparse No segmentation Object class recognition by unsupervised scale-invariant learning. R. Fergus, P. Perona, and A. Zisserman. CVPR 2003 A Bayesian approach to unsupervised One-Shot learning of Object categories. L. Fei-Fei, R. Fergus, and P. Perona. ICCV 2003
Fragment-based Learning to segment. E. Borenstein and S. Ullman. ECCV 2004 Combining top-down and bottom-up segmentation. E. Borenstein, E. Sharon, and S. Ullman. CVPR 2004 Dense model Supervised Non-probabilistic No global shape model
Codebook-based Combined object categorization and segmentation with an implicit shape model. B. Leibe, A. Leonardis, and B. Schiele. ECCV ‘04 Probabilistic Dense model Supervised Ad-hoc inference
OBJ CUT Probabilistic Dense model Supervised Requires video
LOCUS overview Weakly supervised learning Buckets of images - no annotation required. Probabilistic generative model of both object and background. Dense model All pixels modelled, not just at interest points. Combines global and local cues Models global shape and local appearance + edges. Iterative inference process Simultaneous localisation, segmentation, pose estimation.
The LOCUS model
LOCUS model Deformation field D Position & size T Class shape π Class edge sprite μ o,σ o Edge image e Image Object appearance λ 1 Background appearance λ 0 Mask m Shared between images Different for each image
LOCUS model: appearance background object Mask m Background mixture coefficients λ0λ0 Object mixture coefficients λ1λ1 Image z Shared mixture components:
LOCUS model: mask background object 8-neighbour Markov Random Field (as used in GrabCut) favours segmentation along contrast edges
LOCUS model: shape/position … … TNTN T4T4 T2T2 T3T3 T1T1 Transformation Class shape π
Iterative inference … … TNTN T4T4 T2T2 T3T3 T1T1 Class shape π Iteration #1
Iterative inference … … TNTN T4T4 T2T2 T3T3 T1T1 Class shape π Iteration #2
Iterative inference … … TNTN T4T4 T2T2 T3T3 T1T1 Class shape π Iteration #3
Iterative inference … … TNTN T4T4 T2T2 T3T3 T1T1 Class shape π Iteration #5
Iterative inference … … TNTN T4T4 T2T2 T3T3 T1T1 Class shape π Iteration #8
Iterative inference … … TNTN T4T4 T2T2 T3T3 T1T1 Class shape π Iteration #12
Non-rigid objects Class shape π Translation and scale is not enough.
LOCUS model: pose Class shape π T Deformation field D 5x5 blocks Prior ensures smoothness
LOCUS model: pose Class shape π TD 1 TD 2 TD 3 TD N … …
LOCUS model: edge TD 1 TD 2 TD 3 TD N … … Edge images e … Original images Class edge sprite μ o,σ o
LOCUS model: overview Deformation field D Position & size T Class shape π Class edge sprite μ o,σ o Edge image e Image Object appearance λ 1 Background appearance λ 0 Mask m Shared between images Different for each image
Inference Aim to infer all latent variables, For each image: background appearance λ 0, object appearance λ 1, deformation D, transformation T, mask m, Class variables: shape π, edge sprite μ o, σ o. Bayesian inference is carried out using variational message passing with a fully factorised variational distribution. Optimisation of grid-structured variational free energy terms (relating to the deformation field D and the mask m ) achieved using graph cuts.
Experiments & results
Experiments LOCUS applied to 8 sets of 20 images each containing objects of the same class. Horses Faces Cars (rear) Cars (side) Motorbikes Aeroplanes Cows Trees For each class, we ran separate experiments for color and texture appearance models.
Results: horses
Results: cars
Results: remaining classes Cars (rear)FacesMotorbikesPlanesCowsTrees
Segmentation accuracy HorsesCars (side) LOCUS (color) LOCUS (texture) unannotated training images 93.1% 93.0% 91.4% 94.0% Borenstein et al. hand-segmented training images 93.6%- Each image segmented separately 88.6%82.1% To evaluate segmentation quantitively, we used hand segmentations for horses and cars (side).
Object registration Transformation + deformation field registers object outlines (and some internal edges).
Object registration
Extensions to LOCUS
Recognition + segmentation Object recognition using only global shape: Overall: 88% accuracy.
Probabilistic Index Maps 2 indices9 indices Each image has a ‘palette’ of appearance models – palette invariance.
Probabilistic Index Maps
Learning objects from video Object shape Object edge sprite
Locumotion Add flow and track constraints to achieve motion segmentation: Tracking/flow estimation by Larry Zitnick
Conclusions LOCUS gives unsupervised segmentations of accuracy equivalent to state-of-the-art supervised methods. General-purpose model allows: Object localisation Pose estimation Object segmentation Motion segmentation/object tracking Object recognition/detection (in combination with discriminative model)
Questions ?