Generative Models of Images of Objects S. M. Ali Eslami Joint work with Chris Williams Nicolas Heess John Winn June 2012 UoC TTI.

Slides:



Advertisements
Similar presentations
Part 2: Unsupervised Learning
Advertisements

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.
The Helmholtz Machine P Dayan, GE Hinton, RM Neal, RS Zemel
Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
CS590M 2008 Fall: Paper Presentation
Advanced topics.
The Shape Boltzmann Machine S. M. Ali Eslami Nicolas Heess John Winn CVPR 2012 Providence, Rhode Island A Strong Model of Object Shape.
Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
Part 4: Combined segmentation and recognition by Rob Fergus (MIT)
Learning to Combine Bottom-Up and Top-Down Segmentation Anat Levin and Yair Weiss School of CS&Eng, The Hebrew University of Jerusalem, Israel.
Lecture 31: Modern object recognition
Yuanlu Xu Human Re-identification: A Survey.
Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.
Presented by: Mingyuan Zhou Duke University, ECE September 18, 2009
LOCUS (Learning Object Classes with Unsupervised Segmentation) A variational approach to learning model- based segmentation. John Winn Microsoft Research.
Foreground Modeling The Shape of Things that Came Nathan Jacobs Advisor: Robert Pless Computer Science Washington University in St. Louis.
Contour Based Approaches for Visual Object Recognition Jamie Shotton University of Cambridge Joint work with Roberto Cipolla, Andrew Blake.
Pedestrian Detection in Crowded Scenes Dhruv Batra ECE CMU.
São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.
Model: Parts and Structure. History of Idea Fischler & Elschlager 1973 Yuille ‘91 Brunelli & Poggio ‘93 Lades, v.d. Malsburg et al. ‘93 Cootes, Lanitis,
Region labelling Giving a region a name. Image Processing and Computer Vision: 62 Introduction Region detection isolated regions Region description properties.
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
MSRC Summer School - 30/06/2009 Cambridge – UK Hybrids of generative and discriminative methods for machine learning.
The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects By John Winn & Jamie Shotton CVPR 2006 presented by Tomasz.
Transfer Learning of Object Classes: From Cartoons to Photographs NIPS Workshop Inductive Transfer: 10 Years Later Geremy Heitz Gal Elidan Daphne Koller.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/
CSC321: Introduction to Neural Networks and Machine Learning Lecture 20 Learning features one layer at a time Geoffrey Hinton.
Deep Boltzman machines Paper by : R. Salakhutdinov, G. Hinton Presenter : Roozbeh Gholizadeh.
Cue Integration in Figure/Ground Labeling Xiaofeng Ren, Charless Fowlkes and Jitendra Malik, U.C. Berkeley We present a model of edge and region grouping.
Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Boltzmann Machines and their Extensions S. M. Ali Eslami Nicolas Heess John Winn March 2013 Heriott-Watt University.
A General Framework for Tracking Multiple People from a Moving Camera
Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.
Geoffrey Hinton CSC2535: 2013 Lecture 5 Deep Boltzmann Machines.
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.
Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.
Jigsaws: joint appearance and shape clustering John Winn with Anitha Kannan and Carsten Rother Microsoft Research, Cambridge.
Towards Total Scene Understanding: Classification, Annotation and Segmentation in an Automatic Framework N 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Part 4: combined segmentation and recognition Li Fei-Fei.
Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Object Recognition by Parts
Learning Deep Generative Models by Ruslan Salakhutdinov
Summary of “Efficient Deep Learning for Stereo Matching”
Energy models and Deep Belief Networks
Multimodal Learning with Deep Boltzmann Machines
LOCUS: Learning Object Classes with Unsupervised Segmentation
Nonparametric Semantic Segmentation
Object Recognition by Parts
Object detection as supervised classification
Object Recognition by Parts
Learning to Combine Bottom-Up and Top-Down Segmentation
“The Truth About Cats And Dogs”
Object Recognition by Parts
Brief Review of Recognition + Context
The topic discovery models
Regulation Analysis using Restricted Boltzmann Machines
CSC321 Winter 2007 Lecture 21: Some Demonstrations of Restricted Boltzmann Machines Geoffrey Hinton.
Object Recognition by Parts
Object Recognition with Interest Operators
Presentation transcript:

Generative Models of Images of Objects S. M. Ali Eslami Joint work with Chris Williams Nicolas Heess John Winn June 2012 UoC TTI

Classification

Localization

Foreground/Background Segmentation

Parts-based Object Segmentation

Segment this This talk’s focus

The segmentation task 8 The imageThe segmentation

The segmentation task The generative approach Construct joint model of image and segmentation Learn parameters given dataset Return probable segmentation at test time Some benefits of this approach Flexible with regards to data: – Unsupervised training, – Semi-supervised training. Can inspect quality of model by sampling from it 9

Outline FSA – Factoring shapes and appearances Unsupervised learning of parts (BMVC 2011) ShapeBM – A strong model of FG/BG shape Realism, generalization capability (CVPR 2012) MSBM – Parts-based object segmentation Supervised learning of parts for challenging datasets 10

Factored Shapes and Appearances For Parts-based Object Understanding (BMVC 2011)

12

13

Factored Shapes and Appearances Goal Construct joint model of image and segmentation. Factor appearances Reason about shape independently of its appearance. Factor shapes Represent objects as collections of parts. Systematic combination of parts generates objects’ complete shapes. Learn everything Explicitly model variation of appearances and shapes. 14

Factored Shapes and Appearances 15 Schematic diagram

Factored Shapes and Appearances 16 Graphical model

Factored Shapes and Appearances 17 Shape model

Factored Shapes and Appearances 18 Shape model

Factored Shapes and Appearances Continuous parameterization Factor appearances – Finds probable assignment of pixels to parts without having to enumerate all part depth orderings. – Resolves ambiguities by exploiting knowledge about appearances. 19 Shape model

Factored Shapes and Appearances 20 Handling occlusion

Factored Shapes and Appearances Goal Instead of learning just a template for each part, learn a distribution over such templates. Linear latent variable model Part l ’s mask is governed by a Factor Analysis-like distribution: where is a low-dimensional latent variable, is the factor loading matrix and is the mean mask. 21 Learning shape variability

Factored Shapes and Appearances 22 Appearance model

Factored Shapes and Appearances 23 Appearance model

Factored Shapes and Appearances Goal Learn a model of each part’s RGB values that is as informative as possible about its extent in the image. Position-agnostic appearance model Learn about distribution of colors across images, Learn about distribution of colors within images. Sampling process For each part: 1.Sample an appearance ‘class’ for each part, 2.Samples the parts’ pixels from the current class’ feature histogram. 24 Appearance model

Factored Shapes and Appearances 25 Appearance model

Factored Shapes and Appearances Use EM to find a setting of the shape and appearance parameters that approximately maximizes : 1.Expectation: Block Gibbs and elliptical slice sampling (Murray et al., 2010) to approximate, 2.Maximization: Gradient descent optimization to find where 26 Learning

Existing generative models 27 A comparison Factored parts Factored shape and appearance Shape variability Appearance variability LSM Frey et al. ✓ (layers) ✓ (FA) Sprites Williams and Titsias ✓ (layers) LOCUS Winn and Jojic ✓✓ (deformation) ✓ (colors) MCVQ Ross and Zemel ✓✓ (templates) SCA Jojic et al. ✓✓ (convex) ✓ (histograms) FSA ✓ (softmax) ✓✓ (FA) ✓ (histograms)

Results

Learning a model of cars 29 Training images

Learning a model of cars Model details Number of parts: 3 Number of latent shape dimensions: 2 Number of appearance classes: 5 30

Learning a model of cars 31 Shape model weights Convertible – CoupeLow – High

Learning a model of cars 32 Latent shape space

Learning a model of cars 33 Latent shape space

Other datasets 34 Training dataMean modelFSA samples

Other datasets 35

Segmentation benchmarks Datasets Weizmann horses: 127 train – 200 test. Caltech4: – Cars: 63 train – 60 test, – Faces: 335 train – 100 test, – Motorbikes: 698 train – 100 test, – Airplanes: 700 train – 100 test. Two variants Unsupervised FSA: Train given only RGB images. Supervised FSA: Train using RGB images + their binary masks. 36

Segmentation benchmarks 37 HorsesCarsFacesMotorbikesAirplanes GrabCut Rother et al. 83.9%45.1%83.7%82.4%84.5% Borenstein et al.93.6% LOCUS Winn and Jojic 93.1%91.4% Arora et al.95.1%92.4%83.1%93.1% ClassCut Alexe et al. 86.2%93.1%89.0%90.3%89.8% Unsupervised FSA87.3%82.9%88.3%85.7%88.7% Supervised FSA88.0%93.6%93.3%92.1%90.9%

The Shape Boltzmann Machine A Strong Model of Object Shape (CVPR 2012)

What do we mean by a model of shape? A probabilistic distribution: Defined on binary images Of objects not patches Trained using limited training data 39

Weizmann horse dataset 40 Sample training images 327 images

What can one do with an ideal shape model? 41 Segmentation

What can one do with an ideal shape model? 42 Image completion

What can one do with an ideal shape model? 43 Computer graphics

What is a strong model of shape? We define a strong model of object shape as one which meets two requirements: 44 Realism Generates samples that look realistic Generalization Can generate samples that differ from training images Training images Real distribution Learned distribution

Existing shape models 45 A comparison RealismGeneralization GloballyLocally Mean ✓ Factor Analysis ✓✓ Fragments ✓✓ Grid MRFs/CRFs ✓✓ High-order potentials~ ✓✓ Database ✓✓ ShapeBM ✓✓✓

Existing shape models 46 Most commonly used architectures MRFMean sample from the model

Shallow and Deep architectures 47 Modeling high-order and long-range interactions MRF RBM DBM

From the DBM to the ShapeBM 48 Restricted connectivity and sharing of weights DBMShapeBM Limited training data. Reduce the number of parameters: 1.Restrict connectivity, 2.Restrict capacity, 3.Tie parameters.

Shape Boltzmann Machine 49 Architecture in 2D Top hidden units capture object pose Given the top units, middle hidden units capture local (part) variability Overlap helps prevent discontinuities at patch boundaries

ShapeBM inference 50 Block-Gibbs MCMC image reconstructionsample 1sample n ~500 samples per second

ShapeBM learning Maximize with respect to 1.Pre-training Greedy, layer-by-layer, bottom-up, ‘Persistent CD’ MCMC approximation to the gradients. 2.Joint training Variational + persistent chain approximations to the gradients, Separates learning of local and global shape properties. 51 Stochastic gradient descent ~2-6 hours on the small datasets that we consider

Results

Weizmann horses – 327 images – hidden units Sampled shapes 53 Evaluating the Realism criterion Weizmann horses – 327 images Data FA Incorrect generalization RBM Failure to learn variability ShapeBM Natural shapes Variety of poses Sharply defined details Correct number of legs (!)

Weizmann horses – 327 images – hidden units Sampled shapes 54 Evaluating the Realism criterion Weizmann horses – 327 images

Sampled shapes 55 Evaluating the Generalization criterion Weizmann horses – 327 images – hidden units Sample from the ShapeBM Closest image in training dataset Difference between the two images

Interactive GUI 56 Evaluating Realism and Generalization Weizmann horses – 327 images – hidden units

Imputation scores 1.Collect 25 unseen horse silhouettes, 2.Divide each into 9 segments, 3.Estimate the conditional log probability of a segment under the model given the rest of the image, 4.Average over images and segments. 57 Quantitative comparison Weizmann horses – 327 images – hidden units MeanRBMFAShapeBM Score

Multiple object categories Train jointly on 4 categories without knowledge of class: 58 Simultaneous detection and completion Caltech-101 objects – 531 images – hidden units Shape completion Sampled shapes

What does h 2 do? Weizmann horses Pose information 59 Multiple categories Class label information Number of training images Accuracy

A Generative Model of Objects For Parts-based Object Segmentation (under review)

Joint Model 61

Joint model 62 Schematic diagram

Multinomial Shape Boltzmann Machine 63 Learning a model of pedestrians

Multinomial Shape Boltzmann Machine 64 Learning a shape model for pedestrians

Inference in the joint model Seeding Initialize inference chains at multiple seeds. Choose the segmentation which (approximately) maximizes likelihood of the image. Capacity Resize inferences in the shape model at run-time. Superpixels User image superpixels to refine segmentations. 65 Practical considerations

66

67

Quantitative results 68 PedestriansFGBGUpperLowerHeadAverage Bo and Fowlkes73.3%81.1%73.6%71.6%51.8%69.5% MSBM71.6%73.8%69.9%68.5%54.1%66.6% Top Seed61.6%67.3%60.8%54.1%43.5%56.4% CarsBGBodyWheelWindowBumperAverage ISM93.2%72.2%63.6%80.5%73.8%86.8% MSBM94.6%72.7%36.8%74.4%64.9%86.0% Top Seed92.2%68.4%28.3%63.8%45.4%81.8%

Summary Generative models of images by factoring shapes and appearances. The Shape Boltzmann Machine as a strong model of object shape. The Multinomial Shape Boltzmann Machine as a strong model of parts-based object shape. Inference in generative models for parts-based object segmentation. 69

Questions "Factored Shapes and Appearances for Parts-based Object Understanding" S. M. Ali Eslami, Christopher K. I. Williams (2011) British Machine Vision Conference (BMVC), Dundee, UK "The Shape Boltzmann Machine: a Strong Model of Object Shape" S. M. Ali Eslami, Nicolas Heess and John Winn (2012) Computer Vision and Pattern Recognition (CVPR), Providence, USA MATLAB GUI available at

Shape completion 71 Evaluating Realism and Generalization Weizmann horses – 327 images – hidden units

Constrained shape completion 72 Evaluating Realism and Generalization Weizmann horses – 327 images – hidden units ShapeBM NN

Further results 73 Sampling and completion Caltech motorbikes – 798 images – hidden units Training images ShapeBM samples Sample generalization Shape completion

Further results 74 Constrained completion Caltech motorbikes – 798 images – hidden units ShapeBM NN