Jigsaws: joint appearance and shape clustering John Winn with Anitha Kannan and Carsten Rother Microsoft Research, Cambridge.

Slides:

Advertisements

Similar presentations

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.

Advertisements

The Layout Consistent Random Field for detecting and segmenting occluded objects CVPR, June 2006 John Winn Jamie Shotton.

University of Toronto Oct. 18, 2004 Modelling Motion Patterns with Video Epitomes Machine Learning Group Meeting University of Toronto Oct. 18, 2004 Vincent.

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Part 4: Combined segmentation and recognition by Rob Fergus (MIT)

Object class recognition using unsupervised scale-invariant learning Rob Fergus Pietro Perona Andrew Zisserman Oxford University California Institute of.

Face Alignment with Part-Based Modeling

Yuanlu Xu Human Re-identification: A Survey.

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction Atul Kanaujia, CBIM, Rutgers Cristian Sminchisescu, TTI-C Dimitris Metaxas,CBIM, Rutgers.

3D Face Modeling Michaël De Smet.

Qualifying Exam: Contour Grouping Vida Movahedi Supervisor: James Elder Supervisory Committee: Minas Spetsakis, Jeff Edmonds York University Summer 2009.

LOCUS (Learning Object Classes with Unsupervised Segmentation) A variational approach to learning model- based segmentation. John Winn Microsoft Research.

Robust Object Tracking via Sparsity-based Collaborative Model

Contour Based Approaches for Visual Object Recognition Jamie Shotton University of Cambridge Joint work with Roberto Cipolla, Andrew Blake.

Model: Parts and Structure. History of Idea Fischler & Elschlager 1973 Yuille ‘91 Brunelli & Poggio ‘93 Lades, v.d. Malsburg et al. ‘93 Cootes, Lanitis,

Beyond bags of features: Part-based models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Unsupervised Learning of Categorical Segments in Image Collections *California Institute of Technology **Technion Marco Andreetto*, Lihi Zelnik-Manor**,

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Robust Higher Order Potentials For Enforcing Label Consistency

Advanced Computer Vision Introduction Goal and objectives To introduce the fundamental problems of computer vision. To introduce the main concepts and.

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

Schedule Introduction Models: small cliques and special potentials Tea break Inference: Relaxation techniques:

A Study of Approaches for Object Recognition

TextonBoost : Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation J. Shotton*, J. Winn†, C. Rother†, and A.

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Oxford Brookes Seminar Thursday 3 rd September, 2009 University College London1 Representing Object-level Knowledge for Segmentation and Image Parsing:

MSRC Summer School - 30/06/2009 Cambridge – UK Hybrids of generative and discriminative methods for machine learning.

The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects By John Winn & Jamie Shotton CVPR 2006 presented by Tomasz.

Computational Vision Jitendra Malik University of California at Berkeley Jitendra Malik University of California at Berkeley.

LOCUS Demo Stefan Zickler. Two “different” classes Class “Car Side Views” Class “Car Rears”

Recognition of 3D Objects or, 3D Recognition of Objects Alec Rivers.

Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.

Image Renaissance Using Discrete Optimization Cédric AllèneNikos Paragios ENPC – CERTIS ESIEE – A²SI ECP - MAS France.

3D LayoutCRF Derek Hoiem Carsten Rother John Winn.

Surface Stereo with Soft Segmentation Michael Bleyer 1, Carsten Rother 2, Pushmeet Kohli 2 1 Vienna University of Technology, Austria 2 Microsoft Research.

Clustering appearance and shape by learning jigsaws Anitha Kannan, John Winn, Carsten Rother.

Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.

Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce.

Object Stereo- Joint Stereo Matching and Object Segmentation Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on Michael Bleyer Vienna.

University of Toronto Aug. 11, 2004 Learning the “Epitome” of a Video Sequence Information Processing Workshop 2004 Vincent Cheung Probabilistic and Statistical.

Texture We would like to thank Amnon Drory for this deck הבהרה : החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

Computer Vision Why study Computer Vision? Images and movies are everywhere Fast-growing collection of useful applications –building representations.

Texture We would like to thank Amnon Drory for this deck הבהרה : החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

Representations for object class recognition David Lowe Department of Computer Science University of British Columbia Vancouver, Canada Sept. 21, 2006.

CS 4487/6587 Algorithms for Image Analysis

Efficient Matching of Pictorial Structures By Pedro Felzenszwalb and Daniel Huttenlocher Presented by John Winn.

Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.

Discussion of Pictorial Structures Pedro Felzenszwalb Daniel Huttenlocher Sicily Workshop September, 2006.

Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Learning Jigsaws for clustering appearance and shape John Winn, Anitha Kannan and Carsten Rother NIPS 2006.

Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.

TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation J. Shotton ; University of Cambridge J. Jinn,

Part 4: combined segmentation and recognition Li Fei-Fei.

M ACHINE L EARNING : F OUNDATIONS C OURSE TAU – 2012A P ROF. Y ISHAY M ANSOUR o TextonBoost : Joint Appearance, Shape and Context Modeling for Multi-Class.

Martina Uray Heinz Mayer Joanneum Research Graz Institute of Digital Image Processing Horst Bischof Graz University of Technology Institute for Computer.

Holistic Scene Understanding Virginia Tech ECE /02/26 Stanislaw Antol.

Object Recognition by Parts

Processing visual information for Computer Vision

Article Review Todd Hricik.

LOCUS: Learning Object Classes with Unsupervised Segmentation

Object Recognition by Parts

Object Recognition by Parts

Brief Review of Recognition + Context

Unsupervised Learning of Models for Recognition

Patch-Based Image Classification Using Image Epitomes

Clustering appearance and shape by Jigsaw, and comparing it with Epitome. Papers (1) Clustering appearance and shape by learning jigsaws (2006 NIPS) (2)

The EM Algorithm With Applications To Image Epitome

Object Recognition with Interest Operators

Presentation transcript:

Jigsaws: joint appearance and shape clustering John Winn with Anitha Kannan and Carsten Rother Microsoft Research, Cambridge

Patch models Used for:  Object recognition/detection  Object segmentation But also:  Stereo matching, photo stitching  Texture synthesis  Super-resolution  Motion segmentation  Image/video compression

Patch models  Patch clustering/codebook (e.g. Leibe & Schiele)  Epitome (Jojic et al.) parameter sharing + translation invariant

Issues with fixed patch size/shape  Patch includes background patches containing the same object are not clustered together  Patch excludes part of object patch is less discriminative  Patch includes occlusion occluded and unoccluded objects are not clustered together

Patch size? Small (single pixel) Large (entire image) More discriminative Less sharing More sharing Less discriminative Optimal size/shape? Depends on: object size/shape object variability size of training set Size

Aims of jigsaw model Learn patches (jigsaw pieces) which are 1. Shared: each piece is similar in shape and appearance to many regions of the training images; 2. Discriminative: each piece is as large as possible; 3. Exhaustive: all parts of the training images can be reconstructed from the set of jigsaw pieces.

The Jigsaw model ImageI 1 Offset map L 1... ImageI 2 Offset map L 2 ImageI N Offset map L N Jigsaw J

The Jigsaw model Jigsaw J ImageI 1 Offset map L 1... ImageI 2 Offset map L 2 ImageI N Offset map L N

The Jigsaw model Jigsaw J ImageI 1 Offset map L 1... ImageI 2 Offset map L 2 ImageI N Offset map L N Potts model:

Toy example Training image Jigsaw Learned using EM + graph cuts

Dog example Training image 3232 Jigsaw mean

Dog example Reconstructed image Learned segmentation 3232 Jigsaw mean Epitome reconstruction

Faces example 128128 Jigsaw mean 64 images Source: Olivetti face database

Learning the ‘pieces’ ImageI 1 Offset map L 1... ImageI 2 Offset map L 2 ImageI N Offset map L N Jigsaw J

Learning the ‘pieces’ Jigsaw J

Faces example Results of shape clustering on the face images

64x64 jigsaw Object recognition (preliminary)  Trained set: 20 street images Allow patches to deform (as in LayoutCRF, CVPR 2006).

Object recognition (preliminary)  Trained set: 20 street images (10 labelled) 64x64 jigsaw Accuracy improves (~1%) if you include an additional 10 unlabelled images when learning the jigsaw. Allow patches to deform (as in LayoutCRF, CVPR 2006).

Work in progress…  Training larger jigsaws on 100s of images  Incorporating shape clustering into the probabilistic model  Learning additional invariances e.g. to illumination  Object recognition results on MSRC and other datasets

Conclusions  Jigsaw model allows learning the shape and appearance of objects or object parts in images. Can also handle occlusion.  Clustering shape and appearance much more powerful for recognition than appearance alone.  Can be used as a ‘plug-and-play’ replacement for fixed size patches in any existing patch- based system.

Thank you