LOCUS (Learning Object Classes with Unsupervised Segmentation) A variational approach to learning model- based segmentation. John Winn Microsoft Research.

Slides:

Advertisements

Similar presentations

Part 2: Unsupervised Learning

Advertisements

Bayesian Belief Propagation

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.

The Layout Consistent Random Field for detecting and segmenting occluded objects CVPR, June 2006 John Winn Jamie Shotton.

Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.

Probabilistic Tracking and Recognition of Non-rigid Hand Motion

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Part 4: Combined segmentation and recognition by Rob Fergus (MIT)

Learning to Combine Bottom-Up and Top-Down Segmentation Anat Levin and Yair Weiss School of CS&Eng, The Hebrew University of Jerusalem, Israel.

Object class recognition using unsupervised scale-invariant learning Rob Fergus Pietro Perona Andrew Zisserman Oxford University California Institute of.

Generative Models of Images of Objects S. M. Ali Eslami Joint work with Chris Williams Nicolas Heess John Winn June 2012 UoC TTI.

Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.

Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.

Contour Based Approaches for Visual Object Recognition Jamie Shotton University of Cambridge Joint work with Roberto Cipolla, Andrew Blake.

Model: Parts and Structure. History of Idea Fischler & Elschlager 1973 Yuille ‘91 Brunelli & Poggio ‘93 Lades, v.d. Malsburg et al. ‘93 Cootes, Lanitis,

Variational Inference and Variational Message Passing

Beyond bags of features: Part-based models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Fitting: The Hough transform

Learning to Detect A Salient Object Reporter: 鄭綱 (3/2)

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Robust Higher Order Potentials For Enforcing Label Consistency

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

A Study of Approaches for Object Recognition

1 Model Fitting Hao Jiang Computer Science Department Oct 8, 2009.

TextonBoost : Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation J. Shotton*, J. Winn†, C. Rother†, and A.

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.

Unsupervised discovery of visual object class hierarchies Josef Sivic (INRIA / ENS), Bryan Russell (MIT), Andrew Zisserman (Oxford), Alyosha Efros (CMU)

Object class recognition using unsupervised scale-invariant learning Rob Fergus Pietro Perona Andrew Zisserman Oxford University California Institute of.

The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects By John Winn & Jamie Shotton CVPR 2006 presented by Tomasz.

Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.

Object Class Recognition by Unsupervised Scale-Invariant Learning R. Fergus, P. Perona, and A. Zisserman Presented By Jeff.

LOCUS Demo Stefan Zickler. Two “different” classes Class “Car Side Views” Class “Car Rears”

1 Outline Overview Integrating Vision Models CCM: Cascaded Classification Models Learning Spatial Context TAS: Things and Stuff Descriptive Querying of.

Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.

3D LayoutCRF Derek Hoiem Carsten Rother John Winn.

Boltzmann Machines and their Extensions S. M. Ali Eslami Nicolas Heess John Winn March 2013 Heriott-Watt University.

A General Framework for Tracking Multiple People from a Moving Camera

Fitting: The Hough transform. Voting schemes Let each feature vote for all the models that are compatible with it Hopefully the noise features will not.

Object Stereo- Joint Stereo Matching and Object Segmentation Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on Michael Bleyer Vienna.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

A Statistically Selected Part-Based Probabilistic Model for Object Recognition Zhipeng Zhao, Ahmed Elgammal Department of Computer Science, Rutgers, The.

University of Toronto Aug. 11, 2004 Learning the “Epitome” of a Video Sequence Information Processing Workshop 2004 Vincent Cheung Probabilistic and Statistical.

MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.

Bayesian Parameter Estimation Liad Serruya. Agenda Introduction Bayesian decision theory Scale-Invariant Learning Bayesian “One-Shot” Learning.

Fitting: The Hough transform

Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,

Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.

Discussion of Pictorial Structures Pedro Felzenszwalb Daniel Huttenlocher Sicily Workshop September, 2006.

Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Learning Jigsaws for clustering appearance and shape John Winn, Anitha Kannan and Carsten Rother NIPS 2006.

Jigsaws: joint appearance and shape clustering John Winn with Anitha Kannan and Carsten Rother Microsoft Research, Cambridge.

Towards Total Scene Understanding: Classiﬁcation, Annotation and Segmentation in an Automatic Framework N 工科所錢雅馨 2011/01/16 Li-Jia Li, Richard.

Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.

TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation J. Shotton ; University of Cambridge J. Jinn,

Part 4: combined segmentation and recognition Li Fei-Fei.

Holistic Scene Understanding Virginia Tech ECE /02/26 Stanislaw Antol.

Object Recognition by Parts

LOCUS: Learning Object Classes with Unsupervised Segmentation

Object Recognition by Parts

Object Recognition by Parts

Learning to Combine Bottom-Up and Top-Down Segmentation

“The Truth About Cats And Dogs”

Learning Layered Motion Segmentations of Video

Object Recognition by Parts

Object Recognition by Parts

Object Recognition with Interest Operators

Learning complex visual concepts

Presentation transcript:

LOCUS (Learning Object Classes with Unsupervised Segmentation) A variational approach to learning model- based segmentation. John Winn Microsoft Research Cambridge with Nebojsa Jojic, MSR Redmond 7 th July 2006

Overview Learning object models The LOCUS model Experiments & results Extensions to LOCUS

Goal Long Term Goal Recognise ~10,000 object classes.

Learning from ‘buckets’ of images Horse model Learning algorithm Object Segmentation Object Recognition Object Detection

Object segmentation + Horse model LOCUS

Related work

Constellation models  Weakly supervised  Probabilistic framework  Sparse  No segmentation Object class recognition by unsupervised scale-invariant learning. R. Fergus, P. Perona, and A. Zisserman. CVPR 2003 A Bayesian approach to unsupervised One-Shot learning of Object categories. L. Fei-Fei, R. Fergus, and P. Perona. ICCV 2003

Fragment-based Learning to segment. E. Borenstein and S. Ullman. ECCV 2004 Combining top-down and bottom-up segmentation. E. Borenstein, E. Sharon, and S. Ullman. CVPR 2004  Dense model  Supervised  Non-probabilistic  No global shape model

Codebook-based Combined object categorization and segmentation with an implicit shape model. B. Leibe, A. Leonardis, and B. Schiele. ECCV ‘04  Probabilistic  Dense model  Supervised  Ad-hoc inference

OBJ CUT  Probabilistic  Dense model  Supervised  Requires video

LOCUS overview  Weakly supervised learning Buckets of images - no annotation required.  Probabilistic generative model of both object and background.  Dense model All pixels modelled, not just at interest points.  Combines global and local cues Models global shape and local appearance + edges.  Iterative inference process Simultaneous localisation, segmentation, pose estimation.

The LOCUS model

LOCUS model Deformation field D Position & size T Class shape π Class edge sprite μ o,σ o Edge image e Image Object appearance λ 1 Background appearance λ 0 Mask m Shared between images Different for each image

LOCUS model: appearance background object Mask m Background mixture coefficients λ0λ0 Object mixture coefficients λ1λ1 Image z Shared mixture components:

LOCUS model: mask background object 8-neighbour Markov Random Field (as used in GrabCut) favours segmentation along contrast edges

LOCUS model: shape/position … … TNTN T4T4 T2T2 T3T3 T1T1 Transformation Class shape π

Iterative inference … … TNTN T4T4 T2T2 T3T3 T1T1 Class shape π Iteration #1

Iterative inference … … TNTN T4T4 T2T2 T3T3 T1T1 Class shape π Iteration #2

Iterative inference … … TNTN T4T4 T2T2 T3T3 T1T1 Class shape π Iteration #3

Iterative inference … … TNTN T4T4 T2T2 T3T3 T1T1 Class shape π Iteration #5

Iterative inference … … TNTN T4T4 T2T2 T3T3 T1T1 Class shape π Iteration #8

Iterative inference … … TNTN T4T4 T2T2 T3T3 T1T1 Class shape π Iteration #12

Non-rigid objects Class shape π Translation and scale is not enough.

LOCUS model: pose Class shape π T Deformation field D 5x5 blocks Prior ensures smoothness

LOCUS model: pose Class shape π TD 1 TD 2 TD 3 TD N … …

LOCUS model: edge TD 1 TD 2 TD 3 TD N … … Edge images e … Original images Class edge sprite μ o,σ o

LOCUS model: overview Deformation field D Position & size T Class shape π Class edge sprite μ o,σ o Edge image e Image Object appearance λ 1 Background appearance λ 0 Mask m Shared between images Different for each image

Inference Aim to infer all latent variables, For each image: background appearance λ 0, object appearance λ 1, deformation D, transformation T, mask m, Class variables: shape π, edge sprite μ o, σ o. Bayesian inference is carried out using variational message passing with a fully factorised variational distribution. Optimisation of grid-structured variational free energy terms (relating to the deformation field D and the mask m ) achieved using graph cuts.

Experiments & results

Experiments LOCUS applied to 8 sets of 20 images each containing objects of the same class. Horses Faces Cars (rear) Cars (side) Motorbikes Aeroplanes Cows Trees For each class, we ran separate experiments for color and texture appearance models.

Results: horses

Results: cars

Results: remaining classes Cars (rear)FacesMotorbikesPlanesCowsTrees

Segmentation accuracy HorsesCars (side) LOCUS (color) LOCUS (texture) unannotated training images 93.1% 93.0% 91.4% 94.0% Borenstein et al. hand-segmented training images 93.6%- Each image segmented separately 88.6%82.1% To evaluate segmentation quantitively, we used hand segmentations for horses and cars (side).

Object registration Transformation + deformation field registers object outlines (and some internal edges).

Object registration

Extensions to LOCUS

Recognition + segmentation Object recognition using only global shape: Overall: 88% accuracy.

Probabilistic Index Maps 2 indices9 indices Each image has a ‘palette’ of appearance models – palette invariance.

Probabilistic Index Maps

Learning objects from video Object shape Object edge sprite

Locumotion Add flow and track constraints to achieve motion segmentation: Tracking/flow estimation by Larry Zitnick

Conclusions LOCUS gives unsupervised segmentations of accuracy equivalent to state-of-the-art supervised methods. General-purpose model allows: Object localisation Pose estimation Object segmentation Motion segmentation/object tracking Object recognition/detection (in combination with discriminative model)

Questions ?