What, Where & How Many? Combining Object Detectors and CRFs Ľubor Ladický, Paul Sturgess, Karteek Alahari, Christopher Russell, Philip H.S. Torr http://cms.brookes.ac.uk/research/visiongroup/ 1
Complete Scene Understanding Involves Localization of all instances of foreground objects (“things”) Localization of all background classes (“stuff”) Pixel-wise segmentation 3D reconstruction Pose detection Action recognition Event recognition METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 2
Semantic Segmentation Complete Scene Understanding Involves Localization of all instances of foreground objects (“things”) Localization of all background classes (“stuff”) Pixel-wise segmentation 3D reconstruction Pose detection Action recognition Event recognition Semantic Segmentation METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 3
Complete Scene Understanding Involves Localization of all instances of foreground objects (“things”) Localization of all background classes (“stuff”) Pixel-wise segmentation METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 4
We're interested in whole scene understanding Semantic Scene Understanding We're interested in whole scene understanding Given an image, detect every thing in it. Thing : An object with a specific size and shape. METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Adelson, Forsyth et al. 96 5
We're interested in whole scene understanding Semantic Scene Understanding We're interested in whole scene understanding Given an image, detect every thing in it. Thing : An object with a specific size and shape. METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Adelson, Forsyth et al. 96 6
We're interested in whole scene understanding Semantic Scene Understanding We're interested in whole scene understanding Given an image, detect every thing in it. Thing : An object with a specific size and shape. METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Adelson, Forsyth et al. 96 7
We're interested in whole scene understanding Semantic Scene Understanding We're interested in whole scene understanding Given an image, detect every thing in it. Thing : An object with a specific size and shape. METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Adelson, Forsyth et al. 96 8
We're interested in whole scene understanding Semantic Scene Understanding We're interested in whole scene understanding Given an image, detect every thing in it. Thing : An object with a specific size and shape. METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Adelson, Forsyth et al. 96 9
We're interested in whole scene understanding Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff : Material defined by a homogeneous or repetitive pattern, with no specific spatial extent / shape. METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Adelson, Forsyth et al. 96 10
We're interested in whole scene understanding Semantic Scene Understanding We're interested in whole scene understanding Given an image, label all the stuff Stuff : Material defined by a homogeneous or repetitive pattern, with no specific spatial extent / shape. METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Adelson, Forsyth et al. 96 11
Our plan is to combine Semantic Scene Understanding State of the art sliding window object detection State of the art segmentation techniques car METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 12
Algorithms for Object Localization Sliding window detectors HOG descriptor (Dalal & Triggs CVPR05) Based on histograms of features (Vedaldi et al. ICCV09) Part-based models (Felzenszwalb et al. CVPR09) METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 13
Algorithms for Object Localization Sliding window detectors HOG descriptor (Dalal & Triggs CVPR05) Based on histograms of features (Vedaldi et al. ICCV09) Part-based models (Felzenszwalb et al. CVPR09) METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 14
Algorithms for Object Localization Sliding window detectors HOG descriptor (Dalal & Triggs CVPR05) Based on histograms of features (Vedaldi et al. ICCV09) Part-based models (Felzenszwalb et al. CVPR09) METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 15
Algorithms for Object Localization METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Non-maxima suppression Greedily Using field of indicator variables (Ramanan et al ICCV09) One binary variable yi { 0, 1 } per location/scale 16
Algorithms for Object Localization car METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Non-maxima suppression Greedily Using field of indicator variables (Ramanan et al ICCV09) One binary variable yi { 0, 1 } per location/scale 17
Sliding window detectors car Sliding window + Segmentation OBJCUT (Kumar et al. 05) Updating colour model (GrabCut - Rother et al. 04) METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 18
Sliding window detectors not good for “stuff” METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Try to detect “sky” ! 19
Sliding window detectors not good for “stuff” sky METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Sky is irregular shape not suited to the sliding window approach 20
State-of-the-art for object detection (“things”) Sliding window detectors State-of-the-art for object detection (“things”) Do not work for background classes (“stuff”) No distinct shape Cannot be enclosed in a box Cannot recover from incorrect detections METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 21
Pairwise CRF over pixels Algorithms for Object-class Segmentation Pairwise CRF over pixels CRF construction Input image METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Final segmentation Shotton et al. ECCV06 22
Training of Potentials Algorithms for Object-class Segmentation Pairwise CRF over pixels CRF construction Input image METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Training of Potentials Shotton et al. ECCV06 23
Training of Potentials Algorithms for Object-class Segmentation Pairwise CRF over pixels CRF construction Input image METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Training of Potentials MAP Final segmentation Shotton et al. ECCV06 24
Pairwise CRF over Super-pixels / Segments Algorithms for Object-class Segmentation Pairwise CRF over Super-pixels / Segments Unsupervised segmentation Input image METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Shi, Malik PAMI2000, Comaniciu, Meer PAMI2002, Felzenszwalb, Huttenlocher, IJCV2004 25
Pairwise CRF over Super-pixels / Segments Algorithms for Object-class Segmentation Pairwise CRF over Super-pixels / Segments Unsupervised segmentation Input image METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Shi, Malik PAMI2000, Comaniciu, Meer PAMI2002, Felzenszwalb, Huttenlocher, IJCV2004 26
Training of potentials Algorithms for Object-class Segmentation Pairwise CRF over Super-pixels / Segments Unsupervised segmentation Input image Training of potentials METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Batra et al. CVPR08, Yang et al. CVPR07, Zitnick et al. CVPR08, Rabinovich et al. ICCV07, Boix et al. CVPR10 27
Training of potentials Algorithms for Object-class Segmentation Pairwise CRF over Super-pixels / Segments Unsupervised segmentation Input image Training of potentials METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features MAP Batra et al. CVPR08, Yang et al. CVPR07, Zitnick et al. CVPR08, Rabinovich et al. ICCV07, Boix et al. CVPR10 Final segmentation 28
Associative Hierarchical CRF Algorithms for Object-class Segmentation Associative Hierarchical CRF Multiple segmentations or hierarchies Input image METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Ladický et al. ICCV09 29
Associative Hierarchical CRF Algorithms for Object-class Segmentation Associative Hierarchical CRF Multiple segmentations or hierarchies CRF construction Input image METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Ladický et al. ICCV09 30
Associative Hierarchical CRF Algorithms for Object-class Segmentation Associative Hierarchical CRF Multiple segmentations or hierarchies CRF construction Input image METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features MAP Final segmentation Ladický et al. ICCV09, Russell et al. UAI10 31
State-of-the-art performance on MSRC & CamVid Associative Hierarchical CRF State-of-the-art performance on MSRC & CamVid Merges information at multiple scales Can recover from incorrect segmentations METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Ladický et al. ICCV09, Sturgess et al., BMVC09 32
State-of-the-art performance on MSRC & CamVid Associative Hierarchical CRF State-of-the-art performance on MSRC & CamVid Merges information at multiple scales Can recover from incorrect segmentations However, No concept of “things” Cannot distinguish between multiple instances METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features car Ladický et al. ICCV09, Sturgess et al., BMVC09 33
CRF Formulation with Detectors CRF formulation altered with a potential for each detection METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 34
AH-CRF energy without detectors CRF Formulation with Detectors CRF formulation altered with a potential for each detection AH-CRF energy without detectors METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features CRF graph over pixels 35
CRF Formulation with Detectors CRF formulation altered with a potential for each detection AH-CRF energy without detectors Set of pixels of d-th detection Classifier response Detected label METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features CRF graph over pixels 36
CRF Formulation with Detectors Joint CRF formulation should contain Possibility to reject detection hypothesis Recover the status of the detection (0 / 1) Thus, potential is a minimum over indicator variable yd { 0, 1 } Indicator variables METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features CRF graph over pixels 37
CRF Formulation with Detectors Detection potential should decrease the energy of labelling agreeing with the detection hypothesis Partial disagreement penalized but not directly rejects the hypothesis Indicator variables METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features CRF graph over pixels 38
CRF Formulation with Detectors Detection potential should decrease the energy of labelling agreeing with the detection hypothesis Partial disagreement penalized but not directly rejects the hypothesis Detection hypothesis status Partial inconsistency cost Detection strength METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features CRF graph over pixels 39
CRF Formulation with Detectors Detection potential should decrease the energy of labelling agreeing with the detection hypothesis Partial disagreement penalized but not directly rejects the hypothesis Linear thresholded Linear METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features CRF graph over pixels 40
This higher order potential can be transformed to Inference over CRF with Detectors This higher order potential can be transformed to Solvable with all graph cut-based methods which take the form of Robust PN (Kohli et al. CVPR08) METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 41
Result without detections Results on CamVid dataset Result without detections Set of detections Final Result METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Brostow et al. Sturgess et al. Brostow et al. ECCV08, Sturgess et al. BMVC09 42
Result without detections Results on CamVid dataset Result without detections Set of detections Final Result METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features Also provides number of object instances (using yd’s) 43
Results on VOC2009 dataset CRF without detectors CRF with detectors Input image Input image METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 44
Novel formulation of CRF with detectors Summary Novel formulation of CRF with detectors Can recover from incorrect detections Possible to obtain instances of objects Efficiently solvable METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 45
Future and ongoing work Automatic non-maximum suppression Part-based models in CRF New things & stuff dataset available soon! Questions? METHOD time: complementary set of features Wide variety of object classes found in road scenes HO high quality object-class boundaries Multiclass classifier with feature selection High quality ground truth Discussion Next->features 46