Download presentation
1
Pedestrian Detection and Localization
Members: Đặng Trương Khánh Linh Bùi Huỳnh Lam Bửu Advisor: A.Professor Lê Hoài Bắc UNIVERSITY OF SCIENCE ADVANCED PROGRAM IN COMPUTER SCIENCE Year 2011
2
Outline Introduction Problem statement & application Challenges
Existing approaches. Review HOG and SVM Motivation Overview of methodology Learning phase Detection phase Our contributions: Spatial selective approach Multi-level based approach Fusion Algorithm - Mean Shift Conclusions Future work Reference
3
Problem statement Build up a system which automatically detects and localizes pedestrians in static image. Pedestrians: up-right and fully visible. Our thesis goal is building up an automatic system which can detects & localizes pedestrian objects in static images. More specific our detector will scan all the given images and bound the box around object if it appears in image. Pedestrian should stand up and fully visible in the picture. Our thesis is based on Dalal work – Normalized Histogram of Oriented Gradients (HOG). We concentrate on extracting robust feature.
4
Applications Automated Automobile Driver , or smart camera in general.
Build a software to categorize personal album images to proper catalogue. Video tracking smart surveillance. Action recognition. Develop a System for Smart car which automatically detect objects & a warning msg will appear whether the car tends to hits people or obstacle on the street. Every person has thousands of photo. Another application is a software that can automatically category personal album. Object Detection is one of the first phase of many of computer vision problems like video tracking, or action recognition.
5
Challenges Huge variation in intra-class.
Non-constraints illumination. Variable appearance and clothing. Complex background. Occlusions, different scales. There’re challenges that make pedestrian detection has more difficult Huge variation in intra-class Variable appearance and clothing Background clutter varies from image to image. For example, images can be taken from indoor, outdoor, and under diverse natural factors such as illumination, viewpoint. Color
6
Existing approaches Haar wavelets + SVM: Papageorgiou & Poggio, 2000; Mohan et al 2000 Rectangular differential features + adaBoost: Viola & Jones, 2001 Model based methods: Felzenszwalb 2008 Local Binary Pattern: Wang 2009 Histogram of Oriented Gradients: Dalal and Trigg 2005 Felzenszwalb CVPR 2008 Object detection in general, or pedestrian detection in specific has attracted a lot researcher’s attention. These are some well-known approach up to decade years ago. There are Haar wavelet and Rectangle differential feature which utilize the different between two rectangle. Recently, Felzenszwalb proposed model based method which use HOG as extraction algorithm. The traditional method, SIFT, is also another well-known work. LBP use the order of pixel to construct histogram. Last but not least, HOG utilize gradients information of pixel. Wang ICCV 2009
7
Histogram of Oriented Gradients (HOG)
Base on the gradient of pixels. Because our thesis is based on Histogram of Oriented Gradients, in short HOG. So, we will briefly review HOG method. In pre-processing, detection window is normalized to reduce the illumination effect. Gradient of pixels will be computed and vote to spatial and orientation cells. Block which consists of 4 cells will be contrast normalized and concatenated to form final window feature vector.
8
Histogram of Oriented Gradients Review
block cell For example, sliding window will be divided into grid of points. Block includes 4 cell, each cell has a histogram constructed by pixel in this cell. 9 orientation bins ° degrees Feature vector f = […,…,…, ,…] normalize 9x4 feature vector per cell
9
Histogram of Oriented Gradients Review
block cell 9 orientation bins ° degrees Feature vector f = […,…,…, ,…] normalize 9x4 feature vector per cell
10
Histogram of Oriented Gradients Review
block cell 9 orientation bins ° degrees Feature vector f = […,…,…, ,…] normalize 9x4 feature vector per cell
11
Histogram of Oriented Gradients Review
block cell 9 orientation bins ° degrees Feature vector f = […,…,…, ,…] normalize 9x4 feature vector per cell
12
SVM Review
13
Motivation of choosing HOG
The blob structure based methods have fail to object detection problem. Object detection methods via edge detection are unreliable. Use the advantage of rigid shape of object. Has a good performance and low complexity. Disadvantages: Very high dimensional feature vector. Lack of multi-scale shape of object. It’s just suitable for matching problem. Affect a lot by the variation of intra-class and noise of background. Though people has diversity of shape, in specific circumstance such as walking in the street, people usually are up-right.
14
Contributions Re-implement HOG-based pedestrian detector.
Spatial Selective Method. Multi-level Method. From the disadvantages of HOG we observe, we proposed two methods in order to overcome them. First, SSM is a method of eliminate unimportant region of image to shrink feature vector. Second, MLM get more information about object’s shape to make the feature set more robust. We also re-implement HOG of Dalal. This is not somehow re-invent the wheel, we have to do this to fully understand the philosophy under HOG method.
15
Dataset INRIA pedestrian dataset Train: 1208 positive windows
1218 negative images Test: 566 positive windows 453 negative images We use INRIA pedestrian dataset which is very challenge b.c people in diverse shape and complex background.
16
Dataset Positive images Negative images Positive windows
Negative windows These are some examples of images and windows. As you can see, a window is a part of a image. In positive windows, pedestrian stands in the center of image.
17
Overview of methodology
Learning Phase: Input: positive windows and negative images. Output: binary classifier Detection Phase: Input: arbitrary image. Output: bounding boxes containing pedestrians.
18
Learning Phase In learning phase, firstly, we have a training dataset which content negative& pos windows. We extract the features over windows in this training set in order to create the first classifier. But we cannot use this classifier, b/c it’s very sensitive with false positive windows. We use the first classifier to run on training negative set to get all false pos windows. After that, we add these false pos windows into training set, training again, to get the better- second classifier
19
Detection Phase
20
Result of re-implementation
21
Examples
22
Contributions PERFORMANCE (Spatial Selective Approach) IMPROVEMENTS
ACCURACY (Multi-Level Approach)
23
Spatial Selective Approach
Less informative region Descriptor By experiment, we observe that there is a small region in the center of window which mostly contains chest and stomach is less informative. We remove the small region at the center, and divide the image into 4 parts. With each part, we compute the feature vector. Finally, we concatenate 4 vectors into the feature vector of the whole img [A1,..,Z1] [A2,..,Z2] [A3,..,Z3] [A4,..,Z4] [A1,..,Z1, A2,..,Z2, A3,..,Z3, A4,..,Z4]
24
Spatial Selective Approach
Examples: The center region mostly contains chest and stomach is less informative. Region (0) & 2 contain most information. Region (0) contains the head and left shoulder. Region (2) has information of legs. Region (1) occupies right shoulder, The (3) one is unreliable because sometimes it does not have any object information. When we test these four parts independently, their performance of them is extremely low because they lack of whole object information. One more thing that significantly affects performance is the overlap of regions. The more these regions overlap to each other, the more accuracy it is. Nonetheless, percentages of overlap of regions accompanies with the size of feature vector.
25
Result Spatial Selective Method
The performance of this new one is approximate with the original one thought the length of new feature vector is reduced by 15-25%.
26
Vector Length v.s Speed A:B Deleted cell(s): Overlap cell(s)
27
Examples Original Re-implement
28
Multi-level Approach Purpose: enhance the performance by getting more information about shape of object. Inspired by pyramid model Our goal in this method is to enhance the accuracy of the detector by adding more object’s shape information. We’re inspired by the pyramid model which is likely see object from near and far distance.
29
Multi-level Approach [A1,..,Z1] [A2,..,Z2] [A3,..,Z3]
HOG Instead of zoom in or zoom out window like the pyramid model, we use different grid of points apply on each level. The grids are designed from fine to coarse. In the dense grid, it is likely we look object in the near distance. So we see object in detail. And in the other level, it look like we see object in far distance, we get object’s shape more general. [A1,..,Z1] [A2,..,Z2] [A3,..,Z3] [A1,..,Z1, A2,..,Z2, A3,..,Z3, A4,..,Z4]
30
Result(cont…)
31
Feature vector length v.s Time
32
Examples Original Multi-level
33
Examples (cont)
34
Fusion Algorithm – Mean Shift
35
Mean shift Region of interest Center of mass Mean Shift vector
Slide by Y. Ukrainitz & B. Sarel
36
Mean shift Region of interest Center of mass Mean Shift vector
Slide by Y. Ukrainitz & B. Sarel
37
Mean shift Region of interest Center of mass Mean Shift vector
Slide by Y. Ukrainitz & B. Sarel
38
Mean shift Region of interest Center of mass Mean Shift vector
Slide by Y. Ukrainitz & B. Sarel
39
Mean shift Region of interest Center of mass Mean Shift vector
Slide by Y. Ukrainitz & B. Sarel
40
Mean shift Region of interest Center of mass Mean Shift vector
Slide by Y. Ukrainitz & B. Sarel
41
Mean shift Region of interest Center of mass
Slide by Y. Ukrainitz & B. Sarel
42
Mean shift clustering Cluster: all data points in the attraction basin of a mode Attraction basin: the region for which all trajectories lead to the same mode Slide by Y. Ukrainitz & B. Sarel
43
Non-maximum suppression
Using non-maximum suppression such as mean shift to find the modes.
44
Conclusions Successfully re-implement HOG descriptor.
Propose the Spatial Selective Approach which take advantages of less informative center region of image window. Multi-level has more information about shape of object. It is general model, and can apply to any object.
45
Future work Non-uniform grid of points.
Combination of Spatial Selective and Multi-level approach. Combine the advantages of spatial selective method & multi-level method in order to enhance the performance & accuracy of algorithm Non-uniform grid of points, we focus on the important regions which contain more informations
46
Non-uniform grid of points
47
Demonstration
48
References N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in IEEE Conference on Computer Vision and Pattern Recognition, 2005. Subhransu Maji et al. Classification using Intersection Kernel Support Vector Machines is Efficient. IEEE Computer Vision and Pattern Recognition 2008 C. Harris and M. Stephens. A combined corner and edge detector. In Alvey Vision Conference, pages 147–151, 1988. D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004.
49
Scan image at all positions and scales
Object/Non-object classifier
50
Miss rate = 1 – recall = 𝑓𝑛 𝑡𝑝+𝑓𝑛
True False tp fn fp tn
51
Overview of methodology
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.