OBJ CUT & Pose Cut CVPR 05 ECCV 06

Slides:

Advertisements

Similar presentations

Using Strong Shape Priors for Multiview Reconstruction Yunda SunPushmeet Kohli Mathieu BrayPhilip HS Torr Department of Computing Oxford Brookes University.

Advertisements

POSE–CUT Simultaneous Segmentation and 3D Pose Estimation of Humans using Dynamic Graph Cuts Mathieu Bray Pushmeet Kohli Philip H.S. Torr Department of.

Mean-Field Theory and Its Applications In Computer Vision1 1.

Bayesian Belief Propagation

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.

Combinatorial Optimization and Computer Vision Philip Torr.

Solving Markov Random Fields using Second Order Cone Programming Relaxations M. Pawan Kumar Philip Torr Andrew Zisserman.

Solving Markov Random Fields using Dynamic Graph Cuts & Second Order Cone Programming Relaxations M. Pawan Kumar, Pushmeet Kohli Philip Torr.

The Layout Consistent Random Field for detecting and segmenting occluded objects CVPR, June 2006 John Winn Jamie Shotton.

Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.

Top-Down & Bottom-Up Segmentation

Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Part 4: Combined segmentation and recognition by Rob Fergus (MIT)

Learning to Combine Bottom-Up and Top-Down Segmentation Anat Levin and Yair Weiss School of CS&Eng, The Hebrew University of Jerusalem, Israel.

ICCV 2007 tutorial Part III Message-passing algorithms for energy minimization Vladimir Kolmogorov University College London.

I Images as graphs Fully-connected graph – node for every pixel – link between every pair of pixels, p,q – similarity w ij for each link j w ij c Source:

Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.

Human Pose detection Abhinav Golas S. Arun Nair. Overview Problem Previous solutions Solution, details.

LOCUS (Learning Object Classes with Unsupervised Segmentation) A variational approach to learning model- based segmentation. John Winn Microsoft Research.

GrabCut Interactive Image (and Stereo) Segmentation Carsten Rother Vladimir Kolmogorov Andrew Blake Antonio Criminisi Geoffrey Cross [based on Siggraph.

GrabCut Interactive Foreground Extraction using Iterated Graph Cuts Carsten Rother Vladimir Kolmogorov Andrew Blake Microsoft Research Cambridge-UK.

Interactive Image Segmentation using Graph Cuts Mayuresh Kulkarni and Fred Nicolls Digital Image Processing Group University of Cape Town PRASA 2009.

Robust Object Tracking via Sparsity-based Collaborative Model

Multiple People Detection and Tracking with Occlusion Presenter: Feifei Huo Supervisor: Dr. Emile A. Hendriks Dr. A. H. J. Stijn Oomes Information and.

1 s-t Graph Cuts for Binary Energy Minimization  Now that we have an energy function, the big question is how do we minimize it? n Exhaustive search is.

Graph-based image segmentation Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering Department of Cybernetics Prague, Czech.

GrabCut Interactive Image (and Stereo) Segmentation Joon Jae Lee Keimyung University Welcome. I will present Grabcut – an Interactive tool for foreground.

Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.

Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking? Philip H.S. Torr Pawan Kumar, Pushmeet Kohli, Matt Bray.

Model: Parts and Structure. History of Idea Fischler & Elschlager 1973 Yuille ‘91 Brunelli & Poggio ‘93 Lades, v.d. Malsburg et al. ‘93 Cootes, Lanitis,

Beyond bags of features: Part-based models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Robust Higher Order Potentials For Enforcing Label Consistency

Schedule Introduction Models: small cliques and special potentials Tea break Inference: Relaxation techniques:

ICCV Tutorial 2007 Philip Torr Papers, presentations and videos on web.....

P 3 & Beyond Solving Energies with Higher Order Cliques Pushmeet Kohli Pawan Kumar Philip H. S. Torr Oxford Brookes University CVPR 2007.

The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects By John Winn & Jamie Shotton CVPR 2006 presented by Tomasz.

An Iterative Optimization Approach for Unified Image Segmentation and Matting Hello everyone, my name is Jue Wang, I’m glad to be here to present our paper.

Simultaneous Segmentation and 3D Pose Estimation of Humans Philip H.S. Torr Pawan Kumar, Pushmeet Kohli, Matt Bray Oxford Brookes University Arasanathan.

Measuring Uncertainty in Graph Cut Solutions Pushmeet Kohli Philip H.S. Torr Department of Computing Oxford Brookes University.

Perceptual Organization: Segmentation and Optical Flow.

What, Where & How Many? Combining Object Detectors and CRFs

Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.

Graph-based Segmentation

MRFs and Segmentation with Graph Cuts Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/31/15.

Prakash Chockalingam Clemson University Non-Rigid Multi-Modal Object Tracking Using Gaussian Mixture Models Committee Members Dr Stan Birchfield (chair)

MRFs and Segmentation with Graph Cuts Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/24/10.

Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce.

Multiplicative Bounds for Metric Labeling M. Pawan Kumar École Centrale Paris Joint work with Phil Torr, Daphne Koller.

Object Detection with Discriminatively Trained Part Based Models

MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.

Efficient Discriminative Learning of Parts-based Models M. Pawan Kumar Andrew Zisserman Philip Torr

Associative Hierarchical CRFs for Object Class Image Segmentation

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.

Discussion of Pictorial Structures Pedro Felzenszwalb Daniel Huttenlocher Sicily Workshop September, 2006.

Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11.

Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.

Part 4: combined segmentation and recognition Li Fei-Fei.

Photoconsistency constraint C2 q C1 p l = 2 l = 3 Depth labels If this 3D point is visible in both cameras, pixels p and q should have similar intensities.

Image segmentation.

MRFs and Segmentation with Graph Cuts Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/27/12.

GrabCut Interactive Foreground Extraction using Iterated Graph Cuts Carsten Rother Vladimir Kolmogorov Andrew Blake Microsoft Research Cambridge-UK.

LOCUS: Learning Object Classes with Unsupervised Segmentation

Nonparametric Semantic Segmentation

Learning to Combine Bottom-Up and Top-Down Segmentation

“The Truth About Cats And Dogs”

Learning Layered Motion Segmentations of Video

PRAKASH CHOCKALINGAM, NALIN PRADEEP, AND STAN BIRCHFIELD

Part-based visual tracking with online latent structural learning -Rui Yao et al. ICCV 2013 Cvlab Jung ilchae.

“Traditional” image segmentation

Presentation transcript:

OBJ CUT & Pose Cut CVPR 05 ECCV 06 UNIVERSITY OF OXFORD OBJ CUT & Pose Cut CVPR 05 ECCV 06 Philip Torr M. Pawan Kumar, Pushmeet Kohli and Andrew Zisserman

Conclusion Combining pose inference and segmentation worth investigating. (tommorrow) Tracking = Detection Detection = Segmentation Tracking (pose estimation) = Segmentation.

Segmentation To distinguish cow and horse? First segmentation problem

Aim Given an image, to segment the object Category Model Segmentation Cow Image Segmented Cow Segmentation should (ideally) be shaped like the object e.g. cow-like obtained efficiently in an unsupervised manner able to handle self-occlusion

Challenges Intra-Class Shape Variability Intra-Class Appearance Variability Self Occlusion

Motivation Magic Wand Current methods require user intervention Object and background seed pixels (Boykov and Jolly, ICCV 01) Bounding Box of object (Rother et al. SIGGRAPH 04) Object Seed Pixels Cow Image

Motivation Magic Wand Current methods require user intervention Object and background seed pixels (Boykov and Jolly, ICCV 01) Bounding Box of object (Rother et al. SIGGRAPH 04) Object Seed Pixels Background Seed Pixels Cow Image

Motivation Magic Wand Current methods require user intervention Object and background seed pixels (Boykov and Jolly, ICCV 01) Bounding Box of object (Rother et al. SIGGRAPH 04) Segmented Image

Motivation Magic Wand Current methods require user intervention Object and background seed pixels (Boykov and Jolly, ICCV 01) Bounding Box of object (Rother et al. SIGGRAPH 04) Object Seed Pixels Background Seed Pixels Cow Image

Motivation Magic Wand Current methods require user intervention Object and background seed pixels (Boykov and Jolly, ICCV 01) Bounding Box of object (Rother et al. SIGGRAPH 04) Segmented Image

Motivation Problem Manually intensive Segmentation is not guaranteed to be ‘object-like’ Non Object-like Segmentation

Our Method Borenstein and Ullman, ECCV ’02 Combine object detection with segmentation Borenstein and Ullman, ECCV ’02 Leibe and Schiele, BMVC ’03 Incorporate global shape priors in MRF Detection provides Object Localization Global shape priors Automatically segments the object Note our method completely generic Applicable to any object category model

Outline Problem Formulation Form of Shape Prior Optimization Results

Problem Labelling m over the set of pixels D Shape prior provided by parameter Θ Energy E (m,Θ) = ∑Φx(D|mx)+Φx(mx|Θ) + ∑ Ψxy(mx,my)+ Φ(D|mx,my) Unary terms Likelihood based on colour Unary potential based on distance from Θ Pairwise terms Prior Contrast term Find best labelling m* = arg min ∑ wi E (m,Θi) wi is the weight for sample Θi Unary terms Pairwise terms

MRF m (labels) D (pixels) mx Prior Ψxy(mx,my) my Probability for a labelling consists of Likelihood Unary potential based on colour of pixel Prior which favours same labels for neighbours (pairwise potentials) m (labels) mx Prior Ψxy(mx,my) my Unary Potential Φx(D|mx) x D (pixels) y Image Plane

Example … … … … … … … … Cow Image x x y y Likelihood Ratio (Colour) Background Seed Pixels Object Seed Pixels Φx(D|obj) x … x … Φx(D|bkg) Ψxy(mx,my) y … y … … … … … Likelihood Ratio (Colour) Prior

Example Cow Image Likelihood Ratio (Colour) Prior Background Seed Pixels Object Seed Pixels Likelihood Ratio (Colour) Prior

Contrast-Dependent MRF Probability of labelling in addition has Contrast term which favours boundaries to lie on image edges m (labels) mx my x Contrast Term Φ(D|mx,my) D (pixels) y Image Plane

Example … … … … … … … … Cow Image x x y y Likelihood Ratio (Colour) Background Seed Pixels Object Seed Pixels Φx(D|obj) x … x … Φx(D|bkg) Ψxy(mx,my)+ Φ(D|mx,my) y … y … … … … … Likelihood Ratio (Colour) Prior + Contrast

Example Cow Image Likelihood Ratio (Colour) Prior + Contrast Background Seed Pixels Object Seed Pixels Likelihood Ratio (Colour) Prior + Contrast

Our Model Θ (shape parameter) m (labels) D (pixels) Unary Potential Probability of labelling in addition has Unary potential which depend on distance from Θ (shape parameter) Θ (shape parameter) Unary Potential Φx(mx|Θ) m (labels) mx my Object Category Specific MRF x D (pixels) y Image Plane

Example Cow Image Shape Prior Θ Distance from Θ Prior + Contrast Background Seed Pixels Object Seed Pixels Shape Prior Θ Distance from Θ Prior + Contrast

Example Cow Image Shape Prior Θ Likelihood + Distance from Θ Background Seed Pixels Object Seed Pixels Shape Prior Θ Likelihood + Distance from Θ Prior + Contrast

Example Cow Image Shape Prior Θ Likelihood + Distance from Θ Background Seed Pixels Object Seed Pixels Shape Prior Θ Likelihood + Distance from Θ Prior + Contrast

Outline Problem Formulation Form of Shape Prior Optimization Results E (m,Θ) = ∑Φx(D|mx)+Φx(mx|Θ) + ∑ Ψxy(mx,my)+ Φ(D|mx,my) Form of Shape Prior Optimization Results

Detection BMVC 2004

Layered Pictorial Structures (LPS) Generative model Composition of parts + spatial layout Layer 2 Spatial Layout (Pairwise Configuration) Layer 1 Parts in Layer 2 can occlude parts in Layer 1

Layered Pictorial Structures (LPS) Cow Instance Layer 2 Transformations Θ1 P(Θ1) = 0.9 Layer 1

Layered Pictorial Structures (LPS) Cow Instance Layer 2 Transformations Θ2 P(Θ2) = 0.8 Layer 1

Layered Pictorial Structures (LPS) Unlikely Instance Layer 2 Transformations Θ3 P(Θ3) = 0.01 Layer 1

How to learn LPS From video via motion segmentation see Kumar Torr and Zisserman ICCV 2005.

LPS for Detection Learning Learnt automatically using a set of examples Detection Matches LPS to image using Loopy Belief Propagation Localizes object parts

Detection Like a proposal process.

Pictorial Structures (PS) Fischler and Eschlager. 1973 PS = 2D Parts + Configuration Aim: Learn pictorial structures in an unsupervised manner Layered Pictorial Structures (LPS) Parts + Configuration + Relative depth Identify parts Learn configuration Learn relative depth of parts

Pictorial Structures Affine warp of parts Each parts is a variable States are image locations AND affine deformation Affine warp of parts

Pictorial Structures Each parts is a variable States are image locations MRF favours certain configurations

Bayesian Formulation (MRF) D = image. Di = pixels Є pi , given li (PDF Projection Theorem. ) z = sufficient statistics ψ(li,lj) = const, if valid configuration = 0, otherwise. Pott’s model

Defining the likelihood We want a likelihood that can combine both the outline and the interior appearance of a part. Define features which will be sufficient statistics to discriminate foreground and background:

Features Outline: z1 Chamfer distance Interior: z2 Textons Model joint distribution of z1 z2 as a 2D Gaussian.

Chamfer Match Score Outline (z1) : minimum chamfer distances over multiple outline exemplars dcham= 1/n Σi min{ minj ||ui-vj ||, τ } Image Edge Image Distance Transform

Texton Match Score Texture(z2) : MRF classifier (Varma and Zisserman, CVPR ’03) Multiple texture exemplars x of class t Textons: 3 X 3 square neighbourhood VQ in texton space Descriptor: histogram of texton labelling χ2 distance

Bag of Words/Histogram of Textons Having slagged off BoW’s I reveal we used it all along, no big deal. So this is like a spatially aware bag of words model… Using a spatially flexible set of templates to work out our bag of words.

2. Fitting the Model Cascades of classifiers Solving MRF Efficient likelihood evaluation Solving MRF LBP, use fast algorithm GBP if LBP doesn’t converge Could use Semi Definite Programming (2003) Recent work second order cone programming method best CVPR 2006.

Efficient Detection of parts Cascade of classifiers Top level use chamfer and distance transform for efficient pre filtering At lower level use full texture model for verification, using efficient nearest neighbour speed ups.

Cascade of Classifiers-for each part Y. Amit, and D. Geman, 97?; S. Baker, S. Nayer 95

High Levels based on Outline (x,y)

Side Note Chamfer like linear classifier on distance transform image Felzenszwalb. Tree is a set of linear classifiers. Pictorial structure is a parameterized family of linear classifiers.

Low levels on Texture The top levels of the tree use outline to eliminate patches of the image. Efficiency: Using chamfer distance and pre computed distance map. Remaining candidates evaluated using full texture model.

Efficient Nearest Neighbour Goldstein, Platt and Burges (MSR Tech Report, 2003) Conversion from fixed distance to rectangle search bitvectorij(Rk) = 1 = 0 Nearest neighbour of x Find intervals in all dimensions ‘AND’ appropriate bitvectors Nearest neighbour search on pruned exemplars Rk Є Ii in dimension j

Recently solve via Integer Programming SDP formulation (Torr 2001, AI stats) SOCP formulation (Kumar, Torr & Zisserman this conference) LBP (Huttenlocher, many)

Outline Problem Formulation Form of Shape Prior Optimization Results

Optimization Given image D, find best labelling as m* = arg max p(m|D) Treat LPS parameter Θ as a latent (hidden) variable EM framework E : sample the distribution over Θ M : obtain the labelling m

E-Step Given initial labelling m’, determine p(Θ|m’,D) Problem Efficiently sampling from p(Θ|m’,D) Solution We develop efficient sum-product Loopy Belief Propagation (LBP) for matching LPS. Similar to efficient max-product LBP for MAP estimate Felzenszwalb and Huttenlocher, CVPR ‘04

Results Different samples localize different parts well. We cannot use only the MAP estimate of the LPS.

M-Step Given samples from p(Θ|m’,D), get new labelling mnew Sample Θi provides Object localization to learn RGB distributions of object and background Shape prior for segmentation Problem Maximize expected log likelihood using all samples To efficiently obtain the new labelling

M-Step w1 = P(Θ1|m’,D) Cow Image Shape Θ1 RGB Histogram for Object RGB Histogram for Background

M-Step Best labelling found efficiently using a Single Graph Cut w1 = P(Θ1|m’,D) Cow Image Shape Θ1 Θ1 m (labels) Image Plane D (pixels) Best labelling found efficiently using a Single Graph Cut

Segmentation using Graph Cuts Obj Cut Φx(D|bkg) + Φx(bkg|Θ) x … Ψxy(mx,my)+ Φ(D|mx,my) y … … … m z … … Φz(D|obj) + Φz(obj|Θ) Bkg

Segmentation using Graph Cuts Obj x … y … … … m z … … Bkg

M-Step w2 = P(Θ2|m’,D) Cow Image Shape Θ2 RGB Histogram for Object RGB Histogram for Background

M-Step Best labelling found efficiently using a Single Graph Cut w2 = P(Θ2|m’,D) Cow Image Shape Θ2 Θ2 m (labels) Image Plane D (pixels) Best labelling found efficiently using a Single Graph Cut

M-Step Θ1 Θ2 w1 + w2 + …. Image Plane Image Plane m* = arg min ∑ wi E (m,Θi) Best labelling found efficiently using a Single Graph Cut

Outline Problem Formulation Form of Shape Prior Optimization Results

Results Using LPS Model for Cow Image Segmentation

Results Using LPS Model for Cow In the absence of a clear boundary between object and background Image Segmentation

Results Using LPS Model for Cow Image Segmentation

Results Using LPS Model for Cow Image Segmentation

Results Using LPS Model for Horse Image Segmentation

Results Using LPS Model for Horse Image Segmentation

Results Image Our Method Leibe and Schiele

Results Shape Appearance Shape+Appearance Without Φx(D|mx) Without Φx(mx|Θ)

Face Detector and ObjCut

Do we really need accurate models? Segmentation boundary can be extracted from edges Rough 3D Shape-prior enough for region disambiguation

Energy of the Pose-specific MRF Energy to be minimized Unary term Pairwise potential Potts model Shape prior But what should be the value of θ?

The different terms of the MRF Likelihood of being foreground given a foreground histogram Likelihood of being foreground given all the terms Shape prior model Grimson- Stauffer segmentation Shape prior (distance transform) Resulting Graph-Cuts segmentation Original image

Can segment multiple views simultaneously Put the complete energy in this slide.

Solve via gradient descent Comparable to level set methods Could use other approaches (e.g. Objcut) Need a graph cut per function evaluation

Formulating the Pose Inference Problem Put the complete energy in this slide.

But… … to compute the MAP of E(x) w.r.t the pose, it means that the unary terms will be changed at EACH iteration and the maxflow recomputed! However… Kohli and Torr showed how dynamic graph cuts can be used to efficiently find MAP solutions for MRFs that change minimally from one time instant to the next: Dynamic Graph Cuts (ICCV05).

Dynamic Graph Cuts SA PA PB* PB SB differences between A and B solve Simpler problem PB* differences between A and B similar Talk about jigsaw; write that A and B are similar cheaper operation PB SB computationally expensive operation

Dynamic Image Segmentation Reuse the flows from the previous image frame (consecutive frames) Segmentation Obtained Flows in n-edges

Our Algorithm Ga Gb Maximum flow MAP solution G` difference residual graph (Gr) MAP solution First segmentation problem Ga G` difference between Ga and Gb updated residual graph Gb second segmentation problem Consistent problem G and G*

Dynamic Graph Cut vs Active Cuts Our method flow recycling AC cut recycling Both methods: Tree recycling

Experimental Analysis Running time of the dynamic algorithm MRF consisting of 2x105 latent variables connected in a 4-neighborhood.

Segmentation Comparison Grimson-Stauffer Bathia04 Our method

Segmentation Get rid of the ids and not the ideas

Segmentation Get rid of the ids and not the ideas

Conclusion Combining pose inference and segmentation worth investigating. Tracking = Detection Detection = Segmentation Tracking = Segmentation. Segmentation = SFM ??