What, Where & How Many? Combining Object Detectors and CRFs

Slides:

Advertisements

Similar presentations

Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford.

Advertisements

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.

The Layout Consistent Random Field for detecting and segmenting occluded objects CVPR, June 2006 John Winn Jamie Shotton.

Pose Estimation and Segmentation of People in 3D Movies Karteek Alahari, Guillaume Seguin, Josef Sivic, Ivan Laptev Inria, Ecole Normale Superieure ICCV.

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Ľubor Ladický1 Phil Torr2 Andrew Zisserman1

Joint Optimisation for Object Class Segmentation and Dense Stereo Reconstruction Ľubor Ladický, Paul Sturgess, Christopher Russell, Sunando Sengupta, Yalin.

Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.

Part 4: Combined segmentation and recognition by Rob Fergus (MIT)

Learning to Combine Bottom-Up and Top-Down Segmentation Anat Levin and Yair Weiss School of CS&Eng, The Hebrew University of Jerusalem, Israel.

Scene Labeling Using Beam Search Under Mutex Constraints ID: O-2B-6 Anirban Roy and Sinisa Todorovic Oregon State University 1.

Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.

Efficient Inference for Fully-Connected CRFs with Stationarity

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.

Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs Roozbeh Mottaghi 1, Sanja Fidler 2, Jian Yao 2, Raquel Urtasun 2, Devi Parikh 3 1 UCLA.

GraphCut-based Optimisation for Computer Vision Ľubor Ladický.

Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.

Contour Based Approaches for Visual Object Recognition Jamie Shotton University of Cambridge Joint work with Roberto Cipolla, Andrew Blake.

More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.

Simultaneous Segmentation and 3D Pose Estimation of Humans or Detection + Segmentation = Tracking? Philip H.S. Torr Pawan Kumar, Pushmeet Kohli, Matt Bray.

Models for Scene Understanding – Global Energy models and a Style-Parameterized boosting algorithm (StyP-Boost) Jonathan Warrell, 1 Simon Prince, 2 Philip.

1. Introduction Humanising GrabCut: Learning to segment humans using the Kinect Varun Gulshan, Victor Lempitksy and Andrew Zisserman Dept. of Engineering.

Robust Higher Order Potentials For Enforcing Label Consistency

Schedule Introduction Models: small cliques and special potentials Tea break Inference: Relaxation techniques:

LARGE-SCALE NONPARAMETRIC IMAGE PARSING Joseph Tighe and Svetlana Lazebnik University of North Carolina at Chapel Hill CVPR 2011Workshop on Large-Scale.

TextonBoost : Joint Appearance, Shape and Context Modeling for Multi-Class Object Recognition and Segmentation J. Shotton*, J. Winn†, C. Rother†, and A.

Oxford Brookes Seminar Thursday 3 rd September, 2009 University College London1 Representing Object-level Knowledge for Segmentation and Image Parsing:

The Layout Consistent Random Field for Recognizing and Segmenting Partially Occluded Objects By John Winn & Jamie Shotton CVPR 2006 presented by Tomasz.

Graph Cut based Inference with Co-occurrence Statistics Ľubor Ladický, Chris Russell, Pushmeet Kohli, Philip Torr.

Measuring Uncertainty in Graph Cut Solutions Pushmeet Kohli Philip H.S. Torr Department of Computing Oxford Brookes University.

Generic object detection with deformable part-based models

Graph-based Segmentation

Unsupervised Learning of Categories from Sets of Partially Matching Image Features Kristen Grauman and Trevor Darrel CVPR 2006 Presented By Sovan Biswas.

The Three R’s of Vision Jitendra Malik.

Object Detection Sliding Window Based Approach Context Helps

A General Framework for Tracking Multiple People from a Moving Camera

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.

Pedestrian Detection and Localization

Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.

Associative Hierarchical CRFs for Object Class Image Segmentation Ľubor Ladický 1 1 Oxford Brookes University 2 Microsoft Research Cambridge Based on the.

MSRI workshop, January 2005 Object Recognition Collected databases of objects on uniform background (no occlusions, no clutter) Mostly focus on viewpoint.

Layered Object Detection for Multi-Class Image Segmentation UC Irvine Yi Yang Sam Hallman Deva Ramanan Charless Fowlkes.

Associative Hierarchical CRFs for Object Class Image Segmentation

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.

CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.

Object Detection Overview Viola-Jones Dalal-Triggs Deformable models Deep learning.

Pictorial Structures and Distance Transforms Computer Vision CS 543 / ECE 549 University of Illinois Ian Endres 03/31/11.

Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.

Recognition Using Visual Phrases

Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.

On Detection of Multiple Object Instances using Hough Transforms Olga Barinova Moscow State University Victor Lempitsky University of Oxford Pushmeet Kohli.

Learning Hierarchical Features for Scene Labeling Cle’ment Farabet, Camille Couprie, Laurent Najman, and Yann LeCun by Dong Nie.

Image segmentation.

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Holistic Scene Understanding Virginia Tech ECE /02/26 Stanislaw Antol.

Object detection with deformable part-based models

Semantic Object and Instance Segmentation

Nonparametric Semantic Segmentation

Saliency detection Donghun Yeo CV Lab..

A Tutorial on HOG Human Detection

An HOG-LBP Human Detector with Partial Occlusion Handling

Learning to Combine Bottom-Up and Top-Down Segmentation

“The Truth About Cats And Dogs”

Cascaded Classification Models

Outline Background Motivation Proposed Model Experimental Results

“Traditional” image segmentation

Presentation transcript:

What, Where & How Many? Combining Object Detectors and CRFs Ľubor Ladický, Paul Sturgess, Karteek Alahari, Christopher Russell, Philip H.S. Torr http://cms.brookes.ac.uk/research/visiongroup/ 1

Complete Scene Understanding Involves Localization of all instances of foreground objects (“things”) Localization of all background classes (“stuff”) Pixel-wise segmentation 3D reconstruction Pose detection Action recognition Event recognition METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 2

Semantic Segmentation Complete Scene Understanding Involves Localization of all instances of foreground objects (“things”) Localization of all background classes (“stuff”) Pixel-wise segmentation 3D reconstruction Pose detection Action recognition Event recognition Semantic Segmentation METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 3

Complete Scene Understanding Involves Localization of all instances of foreground objects (“things”) Localization of all background classes (“stuff”) Pixel-wise segmentation METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 4

We're interested in whole scene understanding Semantic Scene Understanding We're interested in whole scene understanding Given an image, detect every thing in it. Thing : An object with a specific size and shape. METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Adelson, Forsyth et al. 96 5

We're interested in whole scene understanding Semantic Scene Understanding We're interested in whole scene understanding Given an image, detect every thing in it. Thing : An object with a specific size and shape. METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Adelson, Forsyth et al. 96 6

We're interested in whole scene understanding Semantic Scene Understanding We're interested in whole scene understanding Given an image, detect every thing in it. Thing : An object with a specific size and shape. METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Adelson, Forsyth et al. 96 7

We're interested in whole scene understanding Semantic Scene Understanding We're interested in whole scene understanding Given an image, detect every thing in it. Thing : An object with a specific size and shape. METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Adelson, Forsyth et al. 96 8

We're interested in whole scene understanding Semantic Scene Understanding We're interested in whole scene understanding Given an image, detect every thing in it. Thing : An object with a specific size and shape. METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Adelson, Forsyth et al. 96 9

We're interested in whole scene understanding Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff : Material defined by a homogeneous or repetitive pattern, with no specific spatial extent / shape. METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Adelson, Forsyth et al. 96 10

We're interested in whole scene understanding Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff : Material defined by a homogeneous or repetitive pattern, with no specific spatial extent / shape. METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Adelson, Forsyth et al. 96 11

Our plan is to combine Semantic Scene Understanding State of the art sliding window object detection State of the art segmentation techniques car METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 12

Algorithms for Object Localization Sliding window detectors HOG descriptor (Dalal & Triggs CVPR05) Based on histograms of features (Vedaldi et al. ICCV09) Part-based models (Felzenszwalb et al. CVPR09) METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 13

Algorithms for Object Localization Sliding window detectors HOG descriptor (Dalal & Triggs CVPR05) Based on histograms of features (Vedaldi et al. ICCV09) Part-based models (Felzenszwalb et al. CVPR09) METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 14

Algorithms for Object Localization Sliding window detectors HOG descriptor (Dalal & Triggs CVPR05) Based on histograms of features (Vedaldi et al. ICCV09) Part-based models (Felzenszwalb et al. CVPR09) METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 15

Algorithms for Object Localization METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Non-maxima suppression Greedily Using field of indicator variables (Ramanan et al ICCV09) One binary variable yi { 0, 1 } per location/scale 16

Algorithms for Object Localization car METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Non-maxima suppression Greedily Using field of indicator variables (Ramanan et al ICCV09) One binary variable yi { 0, 1 } per location/scale 17

Sliding window detectors car Sliding window + Segmentation OBJCUT (Kumar et al. 05) Updating colour model (GrabCut - Rother et al. 04) METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 18

Sliding window detectors not good for “stuff” METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Try to detect “sky” ! 19

Sliding window detectors not good for “stuff” sky METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Sky is irregular shape not suited to the sliding window approach 20

State-of-the-art for object detection (“things”) Sliding window detectors State-of-the-art for object detection (“things”) Do not work for background classes (“stuff”) No distinct shape Cannot be enclosed in a box Cannot recover from incorrect detections METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 21

Pairwise CRF over pixels Algorithms for Object-class Segmentation Pairwise CRF over pixels CRF construction Input image METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Final segmentation Shotton et al. ECCV06 22

Training of Potentials Algorithms for Object-class Segmentation Pairwise CRF over pixels CRF construction Input image METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Training of Potentials Shotton et al. ECCV06 23

Training of Potentials Algorithms for Object-class Segmentation Pairwise CRF over pixels CRF construction Input image METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Training of Potentials MAP Final segmentation Shotton et al. ECCV06 24

Pairwise CRF over Super-pixels / Segments Algorithms for Object-class Segmentation Pairwise CRF over Super-pixels / Segments Unsupervised segmentation Input image METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Shi, Malik PAMI2000, Comaniciu, Meer PAMI2002, Felzenszwalb, Huttenlocher, IJCV2004 25

Pairwise CRF over Super-pixels / Segments Algorithms for Object-class Segmentation Pairwise CRF over Super-pixels / Segments Unsupervised segmentation Input image METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Shi, Malik PAMI2000, Comaniciu, Meer PAMI2002, Felzenszwalb, Huttenlocher, IJCV2004 26

Training of potentials Algorithms for Object-class Segmentation Pairwise CRF over Super-pixels / Segments Unsupervised segmentation Input image Training of potentials METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Batra et al. CVPR08, Yang et al. CVPR07, Zitnick et al. CVPR08, Rabinovich et al. ICCV07, Boix et al. CVPR10 27

Training of potentials Algorithms for Object-class Segmentation Pairwise CRF over Super-pixels / Segments Unsupervised segmentation Input image Training of potentials METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features MAP Batra et al. CVPR08, Yang et al. CVPR07, Zitnick et al. CVPR08, Rabinovich et al. ICCV07, Boix et al. CVPR10 Final segmentation 28

Associative Hierarchical CRF Algorithms for Object-class Segmentation Associative Hierarchical CRF Multiple segmentations or hierarchies Input image METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Ladický et al. ICCV09 29

Associative Hierarchical CRF Algorithms for Object-class Segmentation Associative Hierarchical CRF Multiple segmentations or hierarchies CRF construction Input image METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Ladický et al. ICCV09 30

Associative Hierarchical CRF Algorithms for Object-class Segmentation Associative Hierarchical CRF Multiple segmentations or hierarchies CRF construction Input image METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features MAP Final segmentation Ladický et al. ICCV09, Russell et al. UAI10 31

State-of-the-art performance on MSRC & CamVid Associative Hierarchical CRF State-of-the-art performance on MSRC & CamVid Merges information at multiple scales Can recover from incorrect segmentations METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Ladický et al. ICCV09, Sturgess et al., BMVC09 32

State-of-the-art performance on MSRC & CamVid Associative Hierarchical CRF State-of-the-art performance on MSRC & CamVid Merges information at multiple scales Can recover from incorrect segmentations However, No concept of “things” Cannot distinguish between multiple instances METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features car Ladický et al. ICCV09, Sturgess et al., BMVC09 33

CRF Formulation with Detectors CRF formulation altered with a potential for each detection METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 34

AH-CRF energy without detectors CRF Formulation with Detectors CRF formulation altered with a potential for each detection AH-CRF energy without detectors METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features CRF graph over pixels 35

CRF Formulation with Detectors CRF formulation altered with a potential for each detection AH-CRF energy without detectors Set of pixels of d-th detection Classifier response Detected label METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features CRF graph over pixels 36

CRF Formulation with Detectors Joint CRF formulation should contain Possibility to reject detection hypothesis Recover the status of the detection (0 / 1) Thus, potential is a minimum over indicator variable yd { 0, 1 } Indicator variables METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features CRF graph over pixels 37

CRF Formulation with Detectors Detection potential should decrease the energy of labelling agreeing with the detection hypothesis Partial disagreement penalized but not directly rejects the hypothesis Indicator variables METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features CRF graph over pixels 38

CRF Formulation with Detectors Detection potential should decrease the energy of labelling agreeing with the detection hypothesis Partial disagreement penalized but not directly rejects the hypothesis Detection hypothesis status Partial inconsistency cost Detection strength METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features CRF graph over pixels 39

CRF Formulation with Detectors Detection potential should decrease the energy of labelling agreeing with the detection hypothesis Partial disagreement penalized but not directly rejects the hypothesis Linear thresholded Linear METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features CRF graph over pixels 40

This higher order potential can be transformed to Inference over CRF with Detectors This higher order potential can be transformed to Solvable with all graph cut-based methods which take the form of Robust PN (Kohli et al. CVPR08) METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 41

Result without detections Results on CamVid dataset Result without detections Set of detections Final Result METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Brostow et al. Sturgess et al. Brostow et al. ECCV08, Sturgess et al. BMVC09 42

Result without detections Results on CamVid dataset Result without detections Set of detections Final Result METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Also provides number of object instances (using yd’s) 43

Results on VOC2009 dataset CRF without detectors CRF with detectors Input image Input image METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 44

Novel formulation of CRF with detectors Summary Novel formulation of CRF with detectors Can recover from incorrect detections Possible to obtain instances of objects Efficiently solvable METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 45

Future and ongoing work Automatic non-maximum suppression Part-based models in CRF New things & stuff dataset available soon! Questions? METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 46