Jigsaws: joint appearance and shape clustering John Winn with Anitha Kannan and Carsten Rother Microsoft Research, Cambridge
Patch models Used for: Object recognition/detection Object segmentation But also: Stereo matching, photo stitching Texture synthesis Super-resolution Motion segmentation Image/video compression
Patch models Patch clustering/codebook (e.g. Leibe & Schiele) Epitome (Jojic et al.) parameter sharing + translation invariant
Issues with fixed patch size/shape Patch includes background patches containing the same object are not clustered together Patch excludes part of object patch is less discriminative Patch includes occlusion occluded and unoccluded objects are not clustered together
Patch size? Small (single pixel) Large (entire image) More discriminative Less sharing More sharing Less discriminative Optimal size/shape? Depends on: object size/shape object variability size of training set Size
Aims of jigsaw model Learn patches (jigsaw pieces) which are 1. Shared: each piece is similar in shape and appearance to many regions of the training images; 2. Discriminative: each piece is as large as possible; 3. Exhaustive: all parts of the training images can be reconstructed from the set of jigsaw pieces.
The Jigsaw model ImageI 1 Offset map L 1... ImageI 2 Offset map L 2 ImageI N Offset map L N Jigsaw J
The Jigsaw model Jigsaw J ImageI 1 Offset map L 1... ImageI 2 Offset map L 2 ImageI N Offset map L N
The Jigsaw model Jigsaw J ImageI 1 Offset map L 1... ImageI 2 Offset map L 2 ImageI N Offset map L N Potts model:
Toy example Training image Jigsaw Learned using EM + graph cuts
Dog example Training image 3232 Jigsaw mean
Dog example Reconstructed image Learned segmentation 3232 Jigsaw mean Epitome reconstruction
Faces example 128128 Jigsaw mean 64 images Source: Olivetti face database
Learning the ‘pieces’ ImageI 1 Offset map L 1... ImageI 2 Offset map L 2 ImageI N Offset map L N Jigsaw J
Learning the ‘pieces’ Jigsaw J
Faces example Results of shape clustering on the face images
64x64 jigsaw Object recognition (preliminary) Trained set: 20 street images Allow patches to deform (as in LayoutCRF, CVPR 2006).
Object recognition (preliminary) Trained set: 20 street images (10 labelled) 64x64 jigsaw Accuracy improves (~1%) if you include an additional 10 unlabelled images when learning the jigsaw. Allow patches to deform (as in LayoutCRF, CVPR 2006).
Work in progress… Training larger jigsaws on 100s of images Incorporating shape clustering into the probabilistic model Learning additional invariances e.g. to illumination Object recognition results on MSRC and other datasets
Conclusions Jigsaw model allows learning the shape and appearance of objects or object parts in images. Can also handle occlusion. Clustering shape and appearance much more powerful for recognition than appearance alone. Can be used as a ‘plug-and-play’ replacement for fixed size patches in any existing patch- based system.
Thank you