The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects By John Winn & Jamie Shotton CVPR 2006 presented by Tomasz.

Slides:

Advertisements

Similar presentations

POSE–CUT Simultaneous Segmentation and 3D Pose Estimation of Humans using Dynamic Graph Cuts Mathieu Bray Pushmeet Kohli Philip H.S. Torr Department of.

Advertisements

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.

Joint Face Alignment The Recognition Pipeline

Automatic Photo Pop-up Derek Hoiem Alexei A.Efros Martial Hebert Carnegie Mellon University.

The Layout Consistent Random Field for detecting and segmenting occluded objects CVPR, June 2006 John Winn Jamie Shotton.

Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.

Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.

Road-Sign Detection and Recognition Based on Support Vector Machines Saturnino, Sergio et al. Yunjia Man ECG 782 Dr. Brendan.

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Joint Optimisation for Object Class Segmentation and Dense Stereo Reconstruction Ľubor Ladický, Paul Sturgess, Christopher Russell, Sunando Sengupta, Yalin.

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Part 4: Combined segmentation and recognition by Rob Fergus (MIT)

Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.

I Images as graphs Fully-connected graph – node for every pixel – link between every pair of pixels, p,q – similarity w ij for each link j w ij c Source:

電腦視覺 Computer and Robot Vision I Chapter2: Binary Machine Vision: Thresholding and Segmentation Instructor: Shih-Shinh Huang 1.

Learning to estimate human pose with data driven belief propagation Gang Hua, Ming-Hsuan Yang, Ying Wu CVPR 05.

LOCUS (Learning Object Classes with Unsupervised Segmentation) A variational approach to learning model- based segmentation. John Winn Microsoft Research.

Computer vision: models, learning and inference

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA

Contour Based Approaches for Visual Object Recognition Jamie Shotton University of Cambridge Joint work with Roberto Cipolla, Andrew Blake.

Region labelling Giving a region a name. Image Processing and Computer Vision: 62 Introduction Region detection isolated regions Region description properties.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

Abstract We present a model of curvilinear grouping using piecewise linear representations of contours and a conditional random field to capture continuity.

TextonBoost : Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation J. Shotton*, J. Winn†, C. Rother†, and A.

High-Quality Video View Interpolation

Recovering Articulated Object Models from 3D Range Data Dragomir Anguelov Daphne Koller Hoi-Cheung Pang Praveen Srinivasan Sebastian Thrun Computer Science.

CSE (c) S. Tanimoto, 2007 Segmentation and Labeling 1 Segmentation and Labeling Outline: Edge detection Chain coding of curves Segmentation into.

What, Where & How Many? Combining Object Detectors and CRFs

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Object Recognition by Parts Object recognition started with line segments. - Roberts recognized objects from line segments and junctions. - This led to.

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Recognition of 3D Objects or, 3D Recognition of Objects Alec Rivers.

Graph-based Segmentation

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

3D LayoutCRF Derek Hoiem Carsten Rother John Winn.

Graphical models for part of speech tagging

Local invariant features Cordelia Schmid INRIA, Grenoble.

MRFs and Segmentation with Graph Cuts Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/24/10.

Object Stereo- Joint Stereo Matching and Object Segmentation Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on Michael Bleyer Vienna.

Texture We would like to thank Amnon Drory for this deck הבהרה : החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

Texture We would like to thank Amnon Drory for this deck הבהרה : החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.

Chapter 10, Part II Edge Linking and Boundary Detection The methods discussed in the previous section yield pixels lying only on edges. This section.

Associative Hierarchical CRFs for Object Class Image Segmentation Ľubor Ladický 1 1 Oxford Brookes University 2 Microsoft Research Cambridge Based on the.

Chapter 4: Pattern Recognition. Classification is a process that assigns a label to an object according to some representation of the object’s properties.

CS654: Digital Image Analysis Lecture 25: Hough Transform Slide credits: Guillermo Sapiro, Mubarak Shah, Derek Hoiem.

Computer Vision Lecture #10 Hossam Abdelmunim 1 & Aly A. Farag 2 1 Computer & Systems Engineering Department, Ain Shams University, Cairo, Egypt 2 Electerical.

Associative Hierarchical CRFs for Object Class Image Segmentation

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.

Discussion of Pictorial Structures Pedro Felzenszwalb Daniel Huttenlocher Sicily Workshop September, 2006.

Inference in generative models of images and video John Winn MSR Cambridge May 2004.

Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.

A Dynamic Conditional Random Field Model for Object Segmentation in Image Sequences Duke University Machine Learning Group Presented by Qiuhua Liu March.

Jigsaws: joint appearance and shape clustering John Winn with Anitha Kannan and Carsten Rother Microsoft Research, Cambridge.

Markov Random Fields & Conditional Random Fields

Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.

Object Recognition by Discriminative Combinations of Line Segments and Ellipses Alex Chia ^˚ Susanto Rahardja ^ Deepu Rajan ˚ Maylor Leung ˚ ^ Institute.

Machine Vision ENT 273 Hema C.R. Binary Image Processing Lecture 3.

Holistic Scene Understanding Virginia Tech ECE /02/26 Stanislaw Antol.

Krishna Kumar Singh, Yong Jae Lee University of California, Davis

LOCUS: Learning Object Classes with Unsupervised Segmentation

Nonparametric Semantic Segmentation

Efficiently Selecting Regions for Scene Understanding

Learning to Combine Bottom-Up and Top-Down Segmentation

“The Truth About Cats And Dogs”

Brief Review of Recognition + Context

Human-object interaction

“Traditional” image segmentation

Presentation transcript:

The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects By John Winn & Jamie Shotton CVPR 2006 presented by Tomasz Malisiewicz for CMU’s Misc-Read April 26, 2006

Talk Overview Objective CRF  HRF  LayoutCRF LayoutCRF Potentials Learning Inference Results Summary

LayoutCRF Objectives To detect and segment partially occluded objects of a known category To detect multiple object instances which possibly occlude each other To define a part labeling which densely covers the object of interest To model various types of occlusions (FG/BG, BG/FG, FG/FG)

Related Work Constellation Models of Fergus et al (sparsely detected interest points) Fragment-based models of Ullman and Borenstein

Conditional Random Field (Lafferty ‘01) A random field globally conditioned on the observation X Discriminative framework where we model P(Y|X) and do not explicitly model the marginal P(X)

Hidden Random Field (Szumer ‘05) Extension to CRF with hidden layer of variables The hidden variables represent object ‘parts’ in this work Deterministic Mapping

LayoutCRF An HRF with asymmetric pair-wise potentials, extended with a set of discrete valued instance transformations {T 1,…,T M } M foreground object instances

LayoutCRF *only one non-background class is considered at a time M+1 instance labels: y i \in {0,1,…,M} Each object instance has a separate set of H part labels h i \in {0,1,…,H x M}

LayoutCRF Each transformation T represents the translation and left/right flip of an object instance by indexing all possible integer pixel translations for each flip orientation Each T is linked to every h i

LayoutCRF Potentials Unary Potentials: Use local information to infer part labels (randomized decision trees) Asymmetric Pair-wise Potentials: Measure local part compatibilities Instance Potentials: Encourage correct long-range spatial layout of parts for each object instance

LayoutCRF Potentials: Unary A set of decision trees; each trained on a random subset of the data (improves generalization and efficiency) Each DT returns a distribution over part labels; K DTs are averaged Each non-terminal node in the DT evaluates an intensity difference or absolute intensity difference between a learned pair of pixels and compares this to a learned threshold Window of size D around pixel i

Layout Consistency (for pair-wise potentials) Neighboring pixels whose labels are not layout consistent are not part of the same object Colors represent part labels A label is layout-consistent with itself, and with those labels that are adjacent in the grid ordering above

Distinguished Transitions 1. Background: hi and hj are BG labels 2. Consistent FG: hi and hj are layout-consistent FG labels 3. Object edge: one label is BG, the other is part label lying on object edge 4. Class occlusion: one label is interior FG label, the other is a BG label 5. Instance occlusion: both are FG labels, but not layout-consistent (at least one label is object edge) 6. Inconsistent Interior FG: both labels are interior FG labels, but not layout-consistent (rare)

LayoutCRF Potentials: Pair-wise The value of the pair- wise potential varies according to the transition type e ij is image-based edge cost which encourages object edges to align with image boundaries Contrast term estimated for each image

LayoutCRF Potentials: Instance Look-up tables (histograms) Encourage the correct spatial layout of parts for each object instance by gravitating parts towards their expected positions, given transformation of the instance Weighs strength of potential Returns position i inverse-transformed by the transformation T m

LayoutCRF: What comes next? We just defined the LayoutCRF and its potentials First we need to learn the parameters of the LayoutCRF from labeled training data Then we apply the model to a new image (inference) to obtain a detection and segmentation

Learning (the model parameters) Supervised Algorithm requires foreground / background segmentation, but not part labels

Unary Potential and Part Labeling Part labeling for the training images is initialized based on a dense regular grid that fits the object bounding box Unary classifiers are learned, then new labeling is inferred *Two iterations are sufficient Dense grid is spatially quantized such that a unique part covers several pixels (on average 8x8)

Learning Pair-wise Potentials Parameters are learned via cross- validation by a search over a sensible range of positive values Gradient-based ML learning too slow; (future work: more efficient means of learning these parameters)

Learning Instance Potentials Deformed part labelings of all training images are aligned on their segmentation mask centroids A bounding box is placed relative to the centroid around the part labelings For each pixel within the bounding box, the distribution over part labels is learned by histogramming the deformed training image labels Empirical Distribution over parts h given position w

Inference (on a novel image) Initially, we don’t know the number of object instances and their locations Step1: collapse part labels across instances, merge instance labels together, and remove transformations. MAP inference is performed to obtain part labeling image h*

Inference (on a novel image) Step2: determine number of layout-consistent regions in h* using connected component analysis; pixels are connected if they are layout- consistent This gives us an estimate of M (number of object instances) and initial instance labeling estimate T separately for each instance label

Inference (on a novel image) Step3: re-run MAP inference with full model to get full h, which now distinguishes between instances

Approximate MAP inference via Annealed Expansion Move Algorithm Alternating regular grid expansions at random offset and standard alpha expansions (for changing to BG label) Annealing schedule weakens pair-wise potential during early stages by raising to a power less than one

Results on Cars *Training on images that contain only one visible car instance False Positive

Segmentation Accuracy on Cars Evaluated segmentation accuracy on 20 randomly chosen images of cars, containing 34 car instances Segmentation Accuracy per instance: ratio of intersection to the union of the detected and ground-truth segmentations =.67

Results on Faces

Multi-class LayoutCRF (Future Work)

Summary LayoutCRF used to detect multiple instances of an object of a given class Deformed-grid part labeling densely covers the object Simultaneous detection and segmentation

Questions?

Shortcomings Parts are not shared (remember the deterministic mapping) Model assumes objects at fixed scale and relatively rigid layout (transformations are just orientation flips and translations) No incentive for disconnected regions to belong to the same instance Are the ‘parts’ really object parts?

References J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In International Conference on Machine Learning, M. Szummer. Learning diagram parts with hidden random fields. In International Conference on Document Analysis and Recognition, J. Winn and J. Shotton. The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects. In CVPR 2006.