Download presentation
1
Regionlets for Generic Object Detection
Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc Facebook, Inc. ICCV 2013 oral paper Presenter: Ming-Ming VGG reading group, 6/2/2014.
2
Generic object detection
3
Review: rigid vs. deformable
Rigid objects Tackles local variations, hardly handle deformations Classical cascaded AdaBoost Deformable part models (DPM) Spatial pyramid matching (SPM) of BoW Deformable objects High deformation tolerance results in imprecise localization or false positives for rigid objects Can we mode both rigid and deformable objects in a unified framework?
4
Review: parameter selection
Deformable Part-based Model (DPM) Specify the number of deformable parts Spatial Pyramid Matching Specify the number of pyramids to build Do we have to pre-define model parameters to handle different degrees of deformation?
5
Review: multi scales/viewpoints
DPM Resize an image to detect objects at a fixed scale Multiple models, each deals with one viewpoint Spatial Pyramid Matching No need to resize the image One model, a codebook is used to encode features Can we learn a model that can be easily adapted to arbitrary scales and viewpoints?
6
Review: sliding window vs. selective search
Classical: sliding window Check hundreds of thousands (if not millions) of sliding windows. Resize image and detect objects at fixed scale Needs to use very efficient classifiers Recent: selective search Thousands of object-independent candidate regions [12PAMI] B. Alexe, T. Deselaers, and V. Ferrari, Measuring the objectness of image window. [13IJCV] K. E. A. Van de Sande, J. R. R. Uijlings, T. Gevers, and A. W. M. Smeulders, Selective search for object recognition [13PAMI] Ian Endres, and Derek Hoiem, Category-independent object proposals with diverse ranking. [11ICCV] E. Rahtu, J. Kannala, and M. Blaschko. Learning a category independent object detection cascade. [11CVPR] Z. Zhang, J. Warrell, and P. H. S. Torr. Proposal generation for object detection using cascaded ranking SVMs. InCVPR, 2011 Allowing strong classifier, needs to work with arbitrary window size
7
Motivation A flexible and general object-level representation with
Hassle free deformation handling Arbitrary scales and aspect ratio handling
8
Detection framework Generate candidate detection bounding boxes
Boosting classifier cascades
9
Regionlet: Definition
Region (𝑅): feature extraction region Regionlet ( 𝑟 1 , 𝑟 2 , 𝑟 3 ): A sub-region in a feature extraction area whose position/resolution are relative and normalized to a detection window Bounding box Region Regionlets Why regionlets? Some part of the region may not be informative or even distractive. Why ‘regionlets’? Define sub-parts of regions, as the basic units to extract appearance features. Then organize these subparts into small groups which are more flexible to describe distinct object categories with different degree of deformations. The features of these subregions will be aggregated to a single feature for R, and they are bellow the level of standalone feature extraction region in an object classifier.
10
Regionlet: definition (cont.)
Relative normalized position
11
Regionlet: Feature extraction
12
Regionlet Constructing the regions/regionlets pool
Small region, fewer regionlets -> fine spatial layout Large region, more regionlets -> robust to deformation Use cascaded boosting learning to select the most discriminative regionlets from a largely over-complete pool
13
Regionlet: Training Construct regionlets pool
Enumerating all possible regions is impractical and not necessary. Propose a set of regionlets with random positions (up to 5) inside each region with identical size.
14
Regionlet: Training Constructing the regions/regionlets poolLearning Boosting cascades 16K region/regionlets candidates for each cascade Learning of each cascade stops when the error rate is achieved (1% for positive, 37.5% for negative) Last cascade stops after collecting 5000 weak classifiers Result in 4-7 cascades 2-3 hours to finish training one category on a 8-core machine
15
Regionlets: Testing No image resizing Any scale, any aspect ratio
Adapt the model size to the same size as the object candidate bounding box
16
Experiments Datasets Investigated Features PASCAL VOC 2007, 2010
20 object categories ImageNet Large Scale Object Detection Dataset 200 object categories Investigated Features HOG LBP Covariance Deep Convolutional Neural Network (DCNN) feature (only for the ImageNet challenge)
17
Experiments: PASCAL VOC
Baselines DPM: excellent at detecting rigid objects. E.g. car, bus, etc. SS_SPM: for objects with significant global deformations. E.g. cat, cow sheep, etc. SPM outperforms DPM in airplane and TV. Due to very diverse viewpoints or rich sub-categories. Objectness (use old version of DPM): selective search itself dose not benefit detection tool much (0.6% improvement) in terms of accuracy.
18
Experiments: PASCAL VOC
Baselines: DPM SPM Objectness Regionlets Won 16 out of 20 categories ‘To our best knowledge, is the best performance in VOC07’ Multiple regionlets consistently outperforms single regionlets
19
Experiments: PASCAL VOC
20
Experiments: ImageNet
21
Running speed 0.2 second per image using a single core if candidate bounding boxes are given, real time(>30 frames per second) using 8 cores 2 seconds per image to generate candidate bounding boxes 2-3 hours to finish training one category on a 8-core machine
22
Conclusions A new object representation for object detection
Non-local max-pooling of regionlets Relative normalized locations of regionlets Flexibility to incorporate various types of features A principled data-driven detection framework, effective in handling deformation, multiple scales, multiple viewpoints Superior performance with a fast running speed
23
Latest results Regionlets with Deep CNN feature (outside data)
24
Some really exciting parts during the reading group will be made public valuable after CVPR 2014 results announced, together with source code!
25
My thoughts towards a real-time system
Current bottleneck is object proposal generation State of the art methods needs 2+ seconds to process one image [IJCV13, PAMI12, PAMI13] Our recent results Training on VOC: 20s Testing speed: 300fps Generic ability
26
My thoughts towards a real-time system
27
My thoughts towards a real-time system
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.