BING: Binarized Normed Gradients for Objectness Estimation at 300fps

Slides:

Advertisements

Similar presentations

Patient information extraction in digitized X-ray imagery Hsien-Huang P. Wu Department of Electrical Engineering, National Yunlin University of Science.

Advertisements

Regionlets for Generic Object Detection

Combining Detectors for Human Hand Detection Antonio Hernández, Petia Radeva and Sergio Escalera Computer Vision Center, Universitat Autònoma de Barcelona,

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

Ming-Ming Cheng 1 Ziming Zhang 2 Wen-Yan Lin 3 Philip H. S. Torr 1 1 Oxford University, 2 Boston University 3 Brookes Vision Group Training a generic objectness.

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.

M.S. Student, Hee-Jong Hong

Face detection Many slides adapted from P. Viola.

Ghunhui Gu, Joseph J. Lim, Pablo Arbeláez, Jitendra Malik University of California at Berkeley Berkeley, CA

Student: Yao-Sheng Wang Advisor: Prof. Sheng-Jyh Wang ARTICULATED HUMAN DETECTION 1 Department of Electronics Engineering National Chiao Tung University.

HCI Final Project Robust Real Time Face Detection Paul Viola, Michael Jones, Robust Real-Time Face Detetion, International Journal of Computer Vision,

Real-time Embedded Face Recognition for Smart Home Fei Zuo, Student Member, IEEE, Peter H. N. de With, Senior Member, IEEE.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.

Robust Object Segmentation Using Adaptive Thresholding Xiaxi Huang and Nikolaos V. Boulgouris International Conference on Image Processing 2007.

On the Object Proposal Presented by Yao Lu

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Fast Human Detection Using a Novel Boosted Cascading Structure With Meta Stages Yu-Ting Chen and Chu-Song Chen, Member, IEEE.

Real-Time Face Detection and Tracking Using Multiple Cameras RIT Computer Engineering Senior Design Project John RuppertJustin HnatowJared Holsopple This.

Shape-Based Human Detection and Segmentation via Hierarchical Part- Template Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS.

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

Marco Pedersoli, Jordi Gonzàlez, Xu Hu, and Xavier Roca

Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags Sung Ju Hwang and Kristen Grauman University of Texas at Austin Jingnan.

Object Detection with Discriminatively Trained Part Based Models

BING: Binarized Normed Gradients for Objectness Estimation at 300fps

BING: Binarized Normed Gradients for Objectness Estimation at 300fps

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.

VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR

Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

Wonjun Kim and Changick Kim, Member, IEEE

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

April 21, 2016Introduction to Artificial Intelligence Lecture 22: Computer Vision II 1 Canny Edge Detector The Canny edge detector is a good approximation.

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

Recent developments in object detection

A Plane-Based Approach to Mondrian Stereo Matching

Object Detection based on Segment Masks

Presenter: Ibrahim A. Zedan

Efficient Image Classification on Vertically Decomposed Data

Finding Things: Image Parsing with Regions and Per-Exemplar Detectors

Supervised Time Series Pattern Discovery through Local Importance

Presented by Minh Hoai Nguyen Date: 28 March 2007

SoC and FPGA Oriented High-quality Stereo Vision System

Object Localization Goal: detect the location of an object within an image Fully supervised: Training data labeled with object category and ground truth.

Object detection as supervised classification

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Introduction of Pedestrian Detection

Fast and Robust Object Tracking with Adaptive Detection

Enhanced-alignment Measure for Binary Foreground Map Evaluation

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

A New Approach to Track Multiple Vehicles With the Combination of Robust Detection and Two Classifiers Weidong Min , Mengdan Fan, Xiaoguang Guo, and Qing.

A Convolutional Neural Network Cascade For Face Detection

Efficient Image Classification on Vertically Decomposed Data

Text Detection in Images and Video

Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.

On-going research on Object Detection *Some modification after seminar

CornerNet: Detecting Objects as Paired Keypoints

Progress report 2019/1/14 PHHung.

Outline Background Motivation Proposed Model Experimental Results

RCNN, Fast-RCNN, Faster-RCNN

Fourier Transform of Boundaries

A Novel Smoke Detection Method Using Support Vector Machine

Scalable light field coding using weighted binary images

Object Detection Implementations

Robust Feature Matching and Fast GMS Solution

Shengcong Chen, Changxing Ding, Minfeng Liu 2018

Presentation transcript:

BING: Binarized Normed Gradients for Objectness Estimation at 300fps Ming-Ming Cheng Ziming Zhang Wen-Yan Lin Philip Torr The University of Oxford Boston University Brookes Vision Group 2014 IEEE Conference on Computer Vision and Pattern Recognition

Introduction Related works Methodology Experimental Evaluation Conclusion and Future Work

Introduction Human attention theories hypothesize : The human vision system processes only parts of an image in detail, while leaving others nearly unprocessed. Before identify objects, there are simple mechanisms in the human vision system to select possible object locations. Binarized Normed Gradients for Objectness “BING” search for objects using objectness scores objects with well-defined closed boundaries share surprisingly strong correlation when looking at the norm of the gradient

Introduction In order to efficiently quantify the objectness of an image window Resize the corresponding image windows to small fixed size (8×8) Use the norm of the gradients as a simple 64D feature for learning a generic objectness measure in a cascaded SVM framework The binarized version of the NG feature, namely binarized normed gradients (BING) feature requires only a few atomic CPU operations (ex: ADD, BITWISE SHIFT, etc) Evaluated the method on the PASCAL VOC2007 dataset

Introduction Results 300fps on a single laptop CPU Yield 96.2% detection rate (DR) with 1,000 windows (≈ 0.2% of full sliding windows) Increasing the number of object windows to 5,000, and estimating objectness in 3 different color spaces, the method can achieve 99.5% DR. Train objectness measure on 6 object categories and test on other 14 unseen categories, we observed similar high performance as in standard settings. Use a smaller set of proposals, is much simpler and 1000+ times faster, while being able to predict unseen categories.

Related works Fixation prediction Salient object detection Models aim at predicting saliency points of human eye movement. The prediction results tends to highlight edges and corners rather than the entire objects Not suitable for generating object proposals for detection purpose. Salient object detection Models try to detect the most attention-grabbing object in a scene, and then segment the whole extent of that object Such salient object segmentation for simple images good in image scene analysis, content aware image editing, and it can be used as a cheap tool to process large number of Internet images or build robust applications by automatically selecting good results. These approaches are less likely to work for complicated images where many objects are presented and they are rarely dominant

Related works Objectness proposal generation Methods avoid making decisions early on, by proposing a small number (e.g. 1,000) of category-independent proposals, that are expected to cover all objects in an image Producing rough segmentations as object proposals has been shown to be an effective way of reducing search spaces for category-specific classifiers, whilst allowing the usage of strong classifiers to improve accuracy computationally expensive In addition, for efficient sliding window object detection, keeping the computational cost feasible is very important it can only be used to speed up classifiers that users can provide a good bound on highest score

Methodology To find generic objects within an image, we scan over a predefined quantized window sizes (scales and aspect ratios ) Test 36 quantized target window sizes {(Wo, Ho)}, where Wo, Ho ∈ {10, 20, 40, 80, 160, 320}. Resize the input image to 36 sizes so that 8 × 8 windows in the resized smaller images (from which we extract features), correspond to target windows. Each window is scored with a linear model

Methodology Using non-maximal suppression (NMS), we select a small set of proposals from each size i. sl : filter score, gl : NG feature, l : location , i : size, (x, y) : position of a window

Methodology Some sizes (e.g. 10 × 500) are less likely than others to contain an object instance (e.g. 100 × 100). Thus we define the objectness score (i.e. calibrated filter score) as where vi , ti ∈ R are sperately learnt coefficient and a bias terms for each quantised size i Although Using (3) is very fast, is only required when re-ranking the small set of final proposals.

Methodology Normed gradients (NG) and objectness Objects are stand-alone things with well-defined closed boundaries and centers Resizing windows corresponding to real world objects to a small fixed size =>little variation that closed boundaries could present in such abstracted view object (red) and non-object (green) windows present huge variation in the image space

Methodology Normed gradients (NG) and objectness Resize the input image to different quantized sizes and calculate the normed gradients of each resized image. The values in an 8 × 8 region of these resized normed gradients maps are defined as a 64D normed gradients (NG)2 feature of its corresponding window.

Methodology Normed gradients (NG) and objectness NG feature advantages : No matter how an object changes its position, scale and aspect ratio, its corresponding NG feature will remain roughly unchanged because of the normalized support region of this feature. The dense compact representation of the NG feature makes it very efficient to be calculated and verified, thus having great potential to be involved in realtime applications. NG feature disadvantages: the loss of discriminative ability. The resulted false positives will be processed by subsequent category specific detectors.

Methodology Learning objectness measurement with NG To learn an objectness measure of image windows, we follow the general idea of the two stages cascaded SVM. Stage I learn a single model w for (1) using linear SVM. The ground truth object windows and random sampled background windows are used as positive and negative training samples respectively. Stage II To learn vi and ti in (3) using a linear SVM we evaluate (1) at size i for training images and use the selected (NMS) proposals as training samples, their filter scores as 1D features, and check their labeling using training image annotations

Methodology Learning objectness measurement with NG The learned linear model w looks similar to the multi-size center-surrounded patterns hypothesized as biologically plausible architecture of primates Lower object regions are more often occluded than upper parts. This is represented by w placing less confidence in the lower regions.

Methodology Binarized normed gradients (BING) To make use of recent advantages in model binary approximation, we propose an accelerated version of NG feature to speed up the feature extraction and testing process Our learned linear model can be approximated with a set of basis vectors using Alg. 1, Nw denotes the number of basis vectors, denotes a basis vector and denotes the corresponding coefficient.

Methodology Binarized normed gradients (BING) By further representing each aj using a binary vector and its complement: , where a binarized feature b could be tested using fast BITWISE AND and BIT COUNT operations

Methodology Binarized normed gradients (BING) The key challenge is how to binarize and calculate our NG features efficiently We approximate the normed gradient values (each saved as a BYTE value) using the top Ng binary bits of the BYTE values. Thus, a 64D NG feature gl can be approximated by Ng binarized normed gradients (BING) features as

Methodology First, a BING feature bx,y and its last row rx,y could be saved in a single INT64 and a BYTE variables, respectively Second, adjacent BING features and their rows have a simple cumulative relation enables using atomic updates (BITWISE SHIFT and BITWISE OR) to avoid the loop computing

Methodology The operator BITWISE SHIFT shifts rx−1,y by one bit, automatically through the bit which does not belong to rx,y, and makes room to insert the new bit bx,y using the BITWISE OR operator. Similarly BITWISE SHIFT shifts bx,y−1 by 8 bits automatically through the bits which do not belong to bx,y, and makes room to insert rx,y. a BING feature bx,y, its last row rx,y and last element bx,y 注意出現在式(2)和式(5)中的下標i, x, y, l, k，這些是定位整個向量而不是向量元素的索引。我們可以用一個簡單的原子變量(INT64和BYTE)表示BING特徵和它的最後一行，這樣能夠更有效的進行特徵計算。 Notice that the subscripts i, x, y, l, k, introduced in (2) and (5), are locations of the whole vector rather than index of vector element. We can use a single atomic variable (INT64 and BYTE) to represent a BING feature and its last row, enabling efficient feature computation (Alg. 2).

Methodology BING calculate a set of binary patterns over an 8 × 8 fixed range. The filter score (1) of an image window corresponding to BING features bk,l can be efficiently tested using: where can be tested using fast BITWISE and POPCNT SSE operators We use the 1-D mask [−1, 0, 1] to find image gradients gx and gy in horizontal and vertical directions, while calculating normed gradients using and saving them in BYTE values

Experimental Evaluation Evaluate method on VOC2007 using the DR-#WIN 3 evaluation metric, and compare results with 3 state-of-the-art methods [3, 48, 57] in terms of proposal quality, generalize ability, and efficiency Proposal quality comparisons : Consists of 4,952 images with bounding box annotation for the object instances from 20 categories. The large number of objects and high variety of categories, viewpoint, scale, position, occlusion, and illumination. 3. B. Alexe, T. Deselaers, and V. Ferrari. Measuring the object-ness of image windows. IEEE TPAMI, 34(11), 2012. 48. J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders. Selective search for object recognition. IJCV, 2013. 57. Z. Zhang, J. Warrell, and P. H. Torr. Proposal generation for object detection using cascaded ranking svms. In CVPR, pages 1497-1504, 2011.

Experimental Evaluation As observed by [48], increasing the divergence of proposals by collecting the results from different parameter settings would improve the DR at the cost of increasing the number of proposals (#WIN) SEL [48] uses 80 different parameters to get combined results and achieves 99.1% DR using more than 10,000 proposals. BING achieves 99.5% DR using only 5,000 proposals by simply collecting the results from 3 color spaces (BINGdiversified in Fig. 3): RGB, HSV, and GRAY.

Experimental Evaluation

Experimental Evaluation

Experimental Evaluation

Experimental Evaluation Generalize ability test Show that our objectness proposals are generic over categories by testing our method on images containing objects whose categories are not used for training. Train method using 6 object categories (i.e. the first 6 categories in VOC dataset according to alpha order) and test it using the rest 14 categories. The statistics for training and testing on same or different object categories are represented by BING and BING-generic, respectively Computational time High quality object windows at 300fps 2501 images (VOC2007) takes much less time (20 seconds excluding xml loading time) than testing a single image using some state-of-the-art alternatives (2+ minutes).

Experimental Evaluation BING achieves better performance than others, in general, and is more than three orders of magnitude (i.e. 1,000+ times) faster than most popular alternatives [3, 22, 48]

Experimental Evaluation With the binary approximation to the learned linear filter and BING features, computing response score for each image window only needs a fixed small number of atomic operations. the number of positions at each quantized scale and aspect ratio is equivalent to O(N)(N is the number of pixels in images.) Thus, Computing response scores at all scales and aspect ratios also has the computational complexity O(N).

Experimental Evaluation Further, extracting BING feature and computing response score at each potential position (i.e. an image window) can be calculated with information given by its 2 neighboring positions (i.e. left and upper). This means that the space complexity is also O(N). Compare running time with baseline methods [3, 22, 48, 57] on the same laptop with an Intel i7-3940XM CPU. Average result quality (DR using 1000 proposals) at different approximation levels, measured by Nw and Ng in Sec. 3.3. N/A represents without binarization

Conclusion and Future Work Fast, and high quality objectness measure by using 8 × 8 binarized normed gradients (BING) features, with which computing the objectness of each image window at any scale and aspect ratio only needs a few atomic (i.e. ADD, BITWISE, etc.) operations. Limitations Method predicts a small set of object bounding boxes, so some object categories, a bounding box might not localize the object instances as accurately as a segmentation region, e.g. a snake, wires, etc. Future works It would be interesting to introduce other additional cues to further reduce the number of proposals while maintaining high detection rate, and explore more applications using BING.