Regionlets for Generic Object Detection

Slides:



Advertisements
Similar presentations
Sparse Coding and Its Extensions for Visual Recognition
Advertisements

Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)
Improving the Fisher Kernel for Large-Scale Image Classification Florent Perronnin, Jorge Sanchez, and Thomas Mensink, ECCV 2010 VGG reading group, January.
Diagnosing Error in Object Detectors Department of Computer Science University of Illinois at Urbana-Champaign (UIUC) Derek Hoiem Yodsawalai Chodpathumwan.
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
Limin Wang, Yu Qiao, and Xiaoou Tang
Ming-Ming Cheng 1 Ziming Zhang 2 Wen-Yan Lin 3 Philip H. S. Torr 1 1 Oxford University, 2 Boston University 3 Brookes Vision Group Training a generic objectness.
Lecture 31: Modern object recognition
Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.
Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.
Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
Knowing a Good HOG Filter When You See It: Efficient Selection of Filters for Detection Ejaz Ahmed 1, Gregory Shakhnarovich 2, and Subhransu Maji 3 1 University.
Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.
Large-Scale Object Recognition with Weak Supervision
Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA
EECS 274 Computer Vision Object detection. Human detection HOG features Cue integration Ensemble of classifiers ROC curve Reading: Assigned papers.
Fast intersection kernel SVMs for Realtime Object Detection
More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.
Student: Yao-Sheng Wang Advisor: Prof. Sheng-Jyh Wang ARTICULATED HUMAN DETECTION 1 Department of Electronics Engineering National Chiao Tung University.
Good morning, everyone, thank you for coming to my presentation.
Object Category Detection: Statistical Templates Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/14/15.
Statistical Recognition Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and Kristen Grauman.
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.
Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.
Object Recognition: Conceptual Issues Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and K. Grauman.
Spatial Pyramid Pooling in Deep Convolutional
On the Object Proposal Presented by Yao Lu
From R-CNN to Fast R-CNN
Generic object detection with deformable part-based models
Salient Object Detection by Composition
Learning Visual Bits with Direct Feature Selection Joel Jurik 1 and Rahul Sukthankar 2,3 1 University of Central Florida 2 Intel Research Pittsburgh 3.
Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.
“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)
Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Object Detection with Discriminatively Trained Part Based Models
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.
Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.
Locality-constrained Linear Coding for Image Classification
Project 3 Results.
Object detection, deep learning, and R-CNNs
CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.
Object Detection Overview Viola-Jones Dalal-Triggs Deformable models Deep learning.
Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.
Recognition Using Visual Phrases
Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.
Regionlets for Generic Object Detection IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 37, NO. 10, OCTOBER 2015 Xiaoyu Wang, Ming.
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
Spatial Localization and Detection
Week 4: 6/6 – 6/10 Jeffrey Loppert. This week.. Coded a Histogram of Oriented Gradients (HOG) Feature Extractor Extracted features from positive and negative.
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Recent developments in object detection
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
Cascade for Fast Detection
Object detection with deformable part-based models
Data Driven Attributes for Action Detection
Lit part of blue dress and shadowed part of white dress are the same color
Object detection as supervised classification
Object Detection + Deep Learning
Faster R-CNN By Anthony Martinez.
Outline Background Motivation Proposed Model Experimental Results
RCNN, Fast-RCNN, Faster-RCNN
Heterogeneous convolutional neural networks for visual recognition
Presentation transcript:

Regionlets for Generic Object Detection Xiaoyu Wang, Ming Yang, Shenghuo Zhu, and Yuanqing Lin NEC Labs America, Inc. Facebook, Inc. ICCV 2013 oral paper Presenter: Ming-Ming Cheng @ VGG reading group, 6/2/2014.

Generic object detection

Review: rigid vs. deformable Rigid objects Tackles local variations, hardly handle deformations Classical cascaded AdaBoost Deformable part models (DPM) Spatial pyramid matching (SPM) of BoW Deformable objects High deformation tolerance results in imprecise localization or false positives for rigid objects Can we mode both rigid and deformable objects in a unified framework?

Review: parameter selection Deformable Part-based Model (DPM) Specify the number of deformable parts Spatial Pyramid Matching Specify the number of pyramids to build Do we have to pre-define model parameters to handle different degrees of deformation?

Review: multi scales/viewpoints DPM Resize an image to detect objects at a fixed scale Multiple models, each deals with one viewpoint Spatial Pyramid Matching No need to resize the image One model, a codebook is used to encode features Can we learn a model that can be easily adapted to arbitrary scales and viewpoints?

Review: sliding window vs. selective search Classical: sliding window Check hundreds of thousands (if not millions) of sliding windows. Resize image and detect objects at fixed scale Needs to use very efficient classifiers Recent: selective search Thousands of object-independent candidate regions [12PAMI] B. Alexe, T. Deselaers, and V. Ferrari, Measuring the objectness of image window. [13IJCV] K. E. A. Van de Sande, J. R. R. Uijlings, T. Gevers, and A. W. M. Smeulders, Selective search for object recognition [13PAMI] Ian Endres, and Derek Hoiem, Category-independent object proposals with diverse ranking. [11ICCV] E. Rahtu, J. Kannala, and M. Blaschko. Learning a category independent object detection cascade. [11CVPR] Z. Zhang, J. Warrell, and P. H. S. Torr. Proposal generation for object detection using cascaded ranking SVMs. InCVPR, 2011 Allowing strong classifier, needs to work with arbitrary window size

Motivation A flexible and general object-level representation with Hassle free deformation handling Arbitrary scales and aspect ratio handling

Detection framework Generate candidate detection bounding boxes Boosting classifier cascades

Regionlet: Definition Region (𝑅): feature extraction region Regionlet ( 𝑟 1 , 𝑟 2 , 𝑟 3 ): A sub-region in a feature extraction area whose position/resolution are relative and normalized to a detection window Bounding box Region Regionlets Why regionlets? Some part of the region may not be informative or even distractive. Why ‘regionlets’? Define sub-parts of regions, as the basic units to extract appearance features. Then organize these subparts into small groups which are more flexible to describe distinct object categories with different degree of deformations. The features of these subregions will be aggregated to a single feature for R, and they are bellow the level of standalone feature extraction region in an object classifier.

Regionlet: definition (cont.) Relative normalized position

Regionlet: Feature extraction

Regionlet Constructing the regions/regionlets pool Small region, fewer regionlets -> fine spatial layout Large region, more regionlets -> robust to deformation Use cascaded boosting learning to select the most discriminative regionlets from a largely over-complete pool

Regionlet: Training Construct regionlets pool Enumerating all possible regions is impractical and not necessary. Propose a set of regionlets with random positions (up to 5) inside each region with identical size.

Regionlet: Training Constructing the regions/regionlets poolLearning Boosting cascades 16K region/regionlets candidates for each cascade Learning of each cascade stops when the error rate is achieved (1% for positive, 37.5% for negative) Last cascade stops after collecting 5000 weak classifiers Result in 4-7 cascades 2-3 hours to finish training one category on a 8-core machine

Regionlets: Testing No image resizing Any scale, any aspect ratio Adapt the model size to the same size as the object candidate bounding box

Experiments Datasets Investigated Features PASCAL VOC 2007, 2010 20 object categories ImageNet Large Scale Object Detection Dataset 200 object categories Investigated Features HOG LBP Covariance Deep Convolutional Neural Network (DCNN) feature (only for the ImageNet challenge)

Experiments: PASCAL VOC Baselines DPM: excellent at detecting rigid objects. E.g. car, bus, etc. SS_SPM: for objects with significant global deformations. E.g. cat, cow sheep, etc. SPM outperforms DPM in airplane and TV. Due to very diverse viewpoints or rich sub-categories. Objectness (use old version of DPM): selective search itself dose not benefit detection tool much (0.6% improvement) in terms of accuracy.

Experiments: PASCAL VOC Baselines: DPM SPM Objectness Regionlets Won 16 out of 20 categories ‘To our best knowledge, is the best performance in VOC07’ Multiple regionlets consistently outperforms single regionlets

Experiments: PASCAL VOC

Experiments: ImageNet

Running speed 0.2 second per image using a single core if candidate bounding boxes are given, real time(>30 frames per second) using 8 cores  2 seconds per image to generate candidate bounding boxes  2-3 hours to finish training one category on a 8-core machine 

Conclusions A new object representation for object detection Non-local max-pooling of regionlets Relative normalized locations of regionlets Flexibility to incorporate various types of features A principled data-driven detection framework, effective in handling deformation, multiple scales, multiple viewpoints Superior performance with a fast running speed

Latest results Regionlets with Deep CNN feature (outside data)

Some really exciting parts during the reading group will be made public valuable after CVPR 2014 results announced, together with source code!

My thoughts towards a real-time system Current bottleneck is object proposal generation State of the art methods needs 2+ seconds to process one image [IJCV13, PAMI12, PAMI13] Our recent results Training on VOC: 20s Testing speed: 300fps Generic ability

My thoughts towards a real-time system

My thoughts towards a real-time system