Pedestrian Detection and Localization

Slides:



Advertisements
Similar presentations
Distinctive Image Features from Scale-Invariant Keypoints
Advertisements

Human Detection Phanindra Varma. Detection -- Overview  Human detection in static images is based on the HOG (Histogram of Oriented Gradients) encoding.
Combining Detectors for Human Hand Detection Antonio Hernández, Petia Radeva and Sergio Escalera Computer Vision Center, Universitat Autònoma de Barcelona,
Histograms of Oriented Gradients for Human Detection
Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.
Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.
Lecture 31: Modern object recognition
LPP-HOG: A New Local Image Descriptor for Fast Human Detection Andy Qing Jun Wang and Ru Bo Zhang IEEE International Symposium.
Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Presenter: Hoang, Van Dung
Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.
Detecting Pedestrians by Learning Shapelet Features
More sliding window detection: Discriminative part-based models Many slides based on P. FelzenszwalbP. Felzenszwalb.
São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.
Student: Yao-Sheng Wang Advisor: Prof. Sheng-Jyh Wang ARTICULATED HUMAN DETECTION 1 Department of Electronics Engineering National Chiao Tung University.
Computer and Robot Vision I
Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
Automatic Image Alignment (feature-based) : Computational Photography Alexei Efros, CMU, Fall 2005 with a lot of slides stolen from Steve Seitz and.
Feature extraction: Corners and blobs
Object Detection using Histograms of Oriented Gradients
Scale Invariant Feature Transform (SIFT)
Generic object detection with deformable part-based models
Distinctive Image Features from Scale-Invariant Keypoints By David G. Lowe, University of British Columbia Presented by: Tim Havinga, Joël van Neerbos.
EADS DS / SDC LTIS Page 1 7 th CNES/DLR Workshop on Information Extraction and Scene Understanding for Meter Resolution Image – 29/03/07 - Oberpfaffenhofen.
Shape-Based Human Detection and Segmentation via Hierarchical Part- Template Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS.
COMPUTER VISION: SOME CLASSICAL PROBLEMS ADWAY MITRA MACHINE LEARNING LABORATORY COMPUTER SCIENCE AND AUTOMATION INDIAN INSTITUTE OF SCIENCE June 24, 2013.
“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)
Detecting Pedestrians Using Patterns of Motion and Appearance Paul Viola Microsoft Research Irfan Ullah Dept. of Info. and Comm. Engr. Myongji University.
Marco Pedersoli, Jordi Gonzàlez, Xu Hu, and Xavier Roca
Face detection Slides adapted Grauman & Liebe’s tutorial
Visual Object Recognition
Object Detection with Discriminatively Trained Part Based Models
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
Object Recognition in Images Slides originally created by Bernd Heisele.
Deformable Part Model Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 11 st, 2013.
Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.
Project 3 Results.
Object detection, deep learning, and R-CNNs
Histograms of Oriented Gradients for Human Detection(HOG)
Histograms of Oriented Gradients for Human Detection Navneet Dalal and Bill Triggs CVPR 2005 Another Descriptor.
Sean M. Ficht.  Problem Definition  Previous Work  Methods & Theory  Results.
Human Detection Method Combining HOG and Cumulative Sum based Binary Pattern Jong Gook Ko', Jin Woo Choi', So Hee Park', Jang Hee You', ' Electronics and.
CS 1699: Intro to Computer Vision Detection II: Deformable Part Models Prof. Adriana Kovashka University of Pittsburgh November 12, 2015.
Pedestrian Detection Histograms of Oriented Gradients for Human Detection Navneet Dalal and Bill Triggs CVPR ‘05 Pete Barnum March 8, 2006.
Feature extraction: Corners and blobs. Why extract features? Motivation: panorama stitching We have two images – how do we combine them?
Timo Ahonen, Abdenour Hadid, and Matti Pietikainen
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
Presented by David Lee 3/20/2006
More sliding window detection: Discriminative part-based models
Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.
Keypoint extraction: Corners 9300 Harris Corners Pkwy, Charlotte, NC.
WLD: A Robust Local Image Descriptor Jie Chen, Shiguang Shan, Chu He, Guoying Zhao, Matti Pietikäinen, Xilin Chen, Wen Gao 报告人:蒲薇榄.
Evaluation of Gender Classification Methods with Automatically Detected and Aligned Faces Speaker: Po-Kai Shen Advisor: Tsai-Rong Chang Date: 2010/6/14.
Cascade for Fast Detection
Object detection with deformable part-based models
Presented by David Lee 3/20/2006
Performance of Computer Vision
Lit part of blue dress and shadowed part of white dress are the same color
Digit Recognition using SVMS
Object detection as supervised classification
A Tutorial on HOG Human Detection
An HOG-LBP Human Detector with Partial Occlusion Handling
Pedestrian Detection Histograms of Oriented Gradients for Human Detection Navneet Dalal and Bill Triggs CVPR ‘05 Pete Barnum March 8, 2006.
Pedestrian Detection Histograms of Oriented Gradients for Human Detection Navneet Dalal and Bill Triggs CVPR ‘05 Pete Barnum March 8, 2006.
Brief Review of Recognition + Context
AHED Automatic Human Emotion Detection
AHED Automatic Human Emotion Detection
Presentation transcript:

Pedestrian Detection and Localization Members: Đặng Trương Khánh Linh 0612743 Bùi Huỳnh Lam Bửu 0612733 Advisor: A.Professor Lê Hoài Bắc UNIVERSITY OF SCIENCE ADVANCED PROGRAM IN COMPUTER SCIENCE Year 2011

Outline Introduction Problem statement & application Challenges Existing approaches. Review HOG and SVM Motivation Overview of methodology Learning phase Detection phase Our contributions: Spatial selective approach Multi-level based approach Fusion Algorithm - Mean Shift Conclusions Future work Reference

Problem statement Build up a system which automatically detects and localizes pedestrians in static image. Pedestrians: up-right and fully visible. Our thesis goal is building up an automatic system which can detects & localizes pedestrian objects in static images. More specific our detector will scan all the given images and bound the box around object if it appears in image. Pedestrian should stand up and fully visible in the picture. Our thesis is based on Dalal work – Normalized Histogram of Oriented Gradients (HOG). We concentrate on extracting robust feature.

Applications Automated Automobile Driver , or smart camera in general. Build a software to categorize personal album images to proper catalogue. Video tracking smart surveillance. Action recognition. Develop a System for Smart car which automatically detect objects & a warning msg will appear whether the car tends to hits people or obstacle on the street. Every person has thousands of photo. Another application is a software that can automatically category personal album. Object Detection is one of the first phase of many of computer vision problems like video tracking, or action recognition.

Challenges Huge variation in intra-class. Non-constraints illumination. Variable appearance and clothing. Complex background. Occlusions, different scales. There’re challenges that make pedestrian detection has more difficult Huge variation in intra-class Variable appearance and clothing Background clutter varies from image to image. For example, images can be taken from indoor, outdoor, and under diverse natural factors such as illumination, viewpoint. Color

Existing approaches Haar wavelets + SVM: Papageorgiou & Poggio, 2000; Mohan et al 2000 Rectangular differential features + adaBoost: Viola & Jones, 2001 Model based methods: Felzenszwalb 2008 Local Binary Pattern: Wang 2009 Histogram of Oriented Gradients: Dalal and Trigg 2005 Felzenszwalb CVPR 2008 Object detection in general, or pedestrian detection in specific has attracted a lot researcher’s attention. These are some well-known approach up to decade years ago. There are Haar wavelet and Rectangle differential feature which utilize the different between two rectangle. Recently, Felzenszwalb proposed model based method which use HOG as extraction algorithm. The traditional method, SIFT, is also another well-known work. LBP use the order of pixel to construct histogram. Last but not least, HOG utilize gradients information of pixel. Wang ICCV 2009

Histogram of Oriented Gradients (HOG) Base on the gradient of pixels. Because our thesis is based on Histogram of Oriented Gradients, in short HOG. So, we will briefly review HOG method. In pre-processing, detection window is normalized to reduce the illumination effect. Gradient of pixels will be computed and vote to spatial and orientation cells. Block which consists of 4 cells will be contrast normalized and concatenated to form final window feature vector.

Histogram of Oriented Gradients Review block cell For example, sliding window will be divided into grid of points. Block includes 4 cell, each cell has a histogram constructed by pixel in this cell. 9 orientation bins 0 - 180° degrees Feature vector f = […,…,…, ,…] normalize 9x4 feature vector per cell

Histogram of Oriented Gradients Review block cell 9 orientation bins 0 - 180° degrees Feature vector f = […,…,…, ,…] normalize 9x4 feature vector per cell

Histogram of Oriented Gradients Review block cell 9 orientation bins 0 - 180° degrees Feature vector f = […,…,…, ,…] normalize 9x4 feature vector per cell

Histogram of Oriented Gradients Review block cell 9 orientation bins 0 - 180° degrees Feature vector f = […,…,…, ,…] normalize 9x4 feature vector per cell

SVM Review

Motivation of choosing HOG The blob structure based methods have fail to object detection problem. Object detection methods via edge detection are unreliable. Use the advantage of rigid shape of object. Has a good performance and low complexity. Disadvantages: Very high dimensional feature vector. Lack of multi-scale shape of object. It’s just suitable for matching problem. Affect a lot by the variation of intra-class and noise of background. Though people has diversity of shape, in specific circumstance such as walking in the street, people usually are up-right.

Contributions Re-implement HOG-based pedestrian detector. Spatial Selective Method. Multi-level Method. From the disadvantages of HOG we observe, we proposed two methods in order to overcome them. First, SSM is a method of eliminate unimportant region of image to shrink feature vector. Second, MLM get more information about object’s shape to make the feature set more robust. We also re-implement HOG of Dalal. This is not somehow re-invent the wheel, we have to do this to fully understand the philosophy under HOG method.

Dataset INRIA pedestrian dataset Train: 1208 positive windows 1218 negative images Test: 566 positive windows 453 negative images We use INRIA pedestrian dataset which is very challenge b.c people in diverse shape and complex background.

Dataset Positive images Negative images Positive windows Negative windows These are some examples of images and windows. As you can see, a window is a part of a image. In positive windows, pedestrian stands in the center of image.

Overview of methodology Learning Phase: Input: positive windows and negative images. Output: binary classifier Detection Phase: Input: arbitrary image. Output: bounding boxes containing pedestrians.

Learning Phase In learning phase, firstly, we have a training dataset which content negative& pos windows. We extract the features over windows in this training set in order to create the first classifier. But we cannot use this classifier, b/c it’s very sensitive with false positive windows. We use the first classifier to run on training negative set to get all false pos windows. After that, we add these false pos windows into training set, training again, to get the better- second classifier

Detection Phase

Result of re-implementation

Examples

Contributions PERFORMANCE (Spatial Selective Approach) IMPROVEMENTS ACCURACY (Multi-Level Approach)

Spatial Selective Approach Less informative region Descriptor By experiment, we observe that there is a small region in the center of window which mostly contains chest and stomach is less informative. We remove the small region at the center, and divide the image into 4 parts. With each part, we compute the feature vector. Finally, we concatenate 4 vectors into the feature vector of the whole img [A1,..,Z1] [A2,..,Z2] [A3,..,Z3] [A4,..,Z4] [A1,..,Z1, A2,..,Z2, A3,..,Z3, A4,..,Z4]

Spatial Selective Approach Examples: The center region mostly contains chest and stomach is less informative. Region (0) & 2 contain most information. Region (0) contains the head and left shoulder. Region (2) has information of legs. Region (1) occupies right shoulder, The (3) one is unreliable because sometimes it does not have any object information. When we test these four parts independently, their performance of them is extremely low because they lack of whole object information. One more thing that significantly affects performance is the overlap of regions. The more these regions overlap to each other, the more accuracy it is. Nonetheless, percentages of overlap of regions accompanies with the size of feature vector.

Result Spatial Selective Method The performance of this new one is approximate with the original one thought the length of new feature vector is reduced by 15-25%.

Vector Length v.s Speed A:B  Deleted cell(s): Overlap cell(s)

Examples Original Re-implement

Multi-level Approach Purpose: enhance the performance by getting more information about shape of object. Inspired by pyramid model Our goal in this method is to enhance the accuracy of the detector by adding more object’s shape information. We’re inspired by the pyramid model which is likely see object from near and far distance.

Multi-level Approach [A1,..,Z1] [A2,..,Z2] [A3,..,Z3] HOG Instead of zoom in or zoom out window like the pyramid model, we use different grid of points apply on each level. The grids are designed from fine to coarse. In the dense grid, it is likely we look object in the near distance. So we see object in detail. And in the other level, it look like we see object in far distance, we get object’s shape more general. [A1,..,Z1] [A2,..,Z2] [A3,..,Z3] [A1,..,Z1, A2,..,Z2, A3,..,Z3, A4,..,Z4]

Result(cont…)

Feature vector length v.s Time

Examples Original Multi-level

Examples (cont)

Fusion Algorithm – Mean Shift

Mean shift Region of interest Center of mass Mean Shift vector Slide by Y. Ukrainitz & B. Sarel

Mean shift Region of interest Center of mass Mean Shift vector Slide by Y. Ukrainitz & B. Sarel

Mean shift Region of interest Center of mass Mean Shift vector Slide by Y. Ukrainitz & B. Sarel

Mean shift Region of interest Center of mass Mean Shift vector Slide by Y. Ukrainitz & B. Sarel

Mean shift Region of interest Center of mass Mean Shift vector Slide by Y. Ukrainitz & B. Sarel

Mean shift Region of interest Center of mass Mean Shift vector Slide by Y. Ukrainitz & B. Sarel

Mean shift Region of interest Center of mass Slide by Y. Ukrainitz & B. Sarel

Mean shift clustering Cluster: all data points in the attraction basin of a mode Attraction basin: the region for which all trajectories lead to the same mode Slide by Y. Ukrainitz & B. Sarel

Non-maximum suppression Using non-maximum suppression such as mean shift to find the modes.

Conclusions Successfully re-implement HOG descriptor. Propose the Spatial Selective Approach which take advantages of less informative center region of image window. Multi-level has more information about shape of object. It is general model, and can apply to any object.

Future work Non-uniform grid of points. Combination of Spatial Selective and Multi-level approach. Combine the advantages of spatial selective method & multi-level method in order to enhance the performance & accuracy of algorithm Non-uniform grid of points, we focus on the important regions which contain more informations

Non-uniform grid of points

Demonstration

References N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2005. Subhransu Maji et al. Classification using Intersection Kernel Support Vector Machines is Efficient. IEEE Computer Vision and Pattern Recognition 2008 C. Harris and M. Stephens. A combined corner and edge detector. In Alvey Vision Conference, pages 147–151, 1988. D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004.

Scan image at all positions and scales Object/Non-object classifier

Miss rate = 1 – recall = 𝑓𝑛 𝑡𝑝+𝑓𝑛 True False tp fn fp tn

Overview of methodology