Introduction of Pedestrian Detection

Slides:

Advertisements

Similar presentations

Human Detection Phanindra Varma. Detection -- Overview  Human detection in static images is based on the HOG (Histogram of Oriented Gradients) encoding.

Advertisements

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.

Computer Vision for Human-Computer InteractionResearch Group, Universität Karlsruhe (TH) cv:hci Dr. Edgar Seemann 1 Computer Vision: Histograms of Oriented.

Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.

AdaBoost & Its Applications

Face detection Many slides adapted from P. Viola.

Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei Li,

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

Viola/Jones: features “Rectangle filters” Differences between sums of pixels in adjacent rectangles { y t (x) = +1 if h t (x) >  t -1 otherwise Unique.

Detecting Pedestrians by Learning Shapelet Features

EECS 274 Computer Vision Object detection. Human detection HOG features Cue integration Ensemble of classifiers ROC curve Reading: Assigned papers.

São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.

The Viola/Jones Face Detector Prepared with figures taken from “Robust real-time object detection” CRL 2001/01, February 2001.

The Viola/Jones Face Detector (2001)

HCI Final Project Robust Real Time Face Detection Paul Viola, Michael Jones, Robust Real-Time Face Detetion, International Journal of Computer Vision,

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.

A Robust Real Time Face Detection. Outline  AdaBoost – Learning Algorithm  Face Detection in real life  Using AdaBoost for Face Detection  Improvements.

A Robust Real Time Face Detection. Outline  AdaBoost – Learning Algorithm  Face Detection in real life  Using AdaBoost for Face Detection  Improvements.

Robust Real-Time Object Detection Paul Viola & Michael Jones.

Object Detection Using the Statistics of Parts Henry Schneiderman Takeo Kanade Presented by : Sameer Shirdhonkar December 11, 2003.

Fast Human Detection Using a Novel Boosted Cascading Structure With Meta Stages Yu-Ting Chen and Chu-Song Chen, Member, IEEE.

Foundations of Computer Vision Rapid object / face detection using a Boosted Cascade of Simple features Presented by Christos Stoilas Rapid object / face.

FACE DETECTION AND RECOGNITION By: Paranjith Singh Lohiya Ravi Babu Lavu.

Generic object detection with deformable part-based models

Face Detection using the Viola-Jones Method

Human tracking and counting using the KINECT range sensor based on Adaboost and Kalman Filter ISVC 2013.

A Tutorial on Object Detection Using OpenCV

Shape-Based Human Detection and Segmentation via Hierarchical Part- Template Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS.

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

Detecting Pedestrians Using Patterns of Motion and Appearance Paul Viola Microsoft Research Irfan Ullah Dept. of Info. and Comm. Engr. Myongji University.

Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.

Sign Classification Boosted Cascade of Classifiers using University of Southern California Thang Dinh Eunyoung Kim

Lecture 29: Face Detection Revisited CS4670 / 5670: Computer Vision Noah Snavely.

Face detection Slides adapted Grauman & Liebe’s tutorial

Visual Object Recognition

DIEGO AGUIRRE COMPUTER VISION INTRODUCTION 1. QUESTION What is Computer Vision? 2.

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.

Recognition II Ali Farhadi. We have talked about Nearest Neighbor Naïve Bayes Logistic Regression Boosting.

Stable Multi-Target Tracking in Real-Time Surveillance Video

Robust Real Time Face Detection

The Viola/Jones Face Detector A “paradigmatic” method for real-time object detection Training is slow, but detection is very fast Key ideas Integral images.

Bibek Jang Karki. Outline Integral Image Representation of image in summation format AdaBoost Ranking of features Combining best features to form strong.

Object Detection Overview Viola-Jones Dalal-Triggs Deformable models Deep learning.

Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.

FACE DETECTION : AMIT BHAMARE. WHAT IS FACE DETECTION ? Face detection is computer based technology which detect the face in digital image. Trivial task.

Week 10 Emily Hand UNR.

CS-498 Computer Vision Week 9, Class 2 and Week 10, Class 1

A Brief Introduction on Face Detection Mei-Chen Yeh 04/06/2010 P. Viola and M. J. Jones, Robust Real-Time Face Detection, IJCV 2004.

More sliding window detection: Discriminative part-based models

Face detection Many slides adapted from P. Viola.

Week 4: 6/6 – 6/10 Jeffrey Loppert. This week.. Coded a Histogram of Oriented Gradients (HOG) Feature Extractor Extracted features from positive and negative.

1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.

Reading: R. Schapire, A brief introduction to boosting

Cascade for Fast Detection

License Plate Detection

Object detection with deformable part-based models

Performance of Computer Vision

Presented by Minh Hoai Nguyen Date: 28 March 2007

Lit part of blue dress and shadowed part of white dress are the same color

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

A Tutorial on HOG Human Detection

Cos 429: Face Detection (Part 2) Viola-Jones and AdaBoost Guest Instructor: Andras Ferencz (Your Regular Instructor: Fei-Fei Li) Thanks to Fei-Fei.

ADABOOST(Adaptative Boosting)

Progress report 2019/1/14 PHHung.

A Tutorial on Object Detection Using OpenCV

Lecture 29: Face Detection Revisited

Jie Chen, Shiguang Shan, Shengye Yan, Xilin Chen, Wen Gao

ECE738 Final Project Face Detection Baseline

Presentation transcript:

Introduction of Pedestrian Detection Cheng-Yang Fu 02/04/2015

Source PAMI 2009

Why Pedestrian Detection is hard? Articulated pose Clothing Lightning Background Occlusion Talk about why Pedestrian Detection is hard? A lot of variety

Why Pedestrian Detection is hard? Articulated pose Clothing Lightning Background Occlusion Credit from : Markerless Articulated Human Body Tracking from Multi-View Video with GPU-PSO

Why Pedestrian Detection is hard? Articulated pose Clothing Lightning Background Occlusion Credit from : Retrieving Similar Styles to Parse Clothing

Why Pedestrian Detection is hard? Articulated pose Clothing Lightning Background Occlusion

Why Pedestrian Detection is hard? Articulated pose Clothing Lightning Background Occlusion City Rural Area Desert

Why Pedestrian Detection is hard? articulated pose Clothing Lightning Background Occlusion Credit from : CVC-05 Partially Occluded Pedestrian Dataset

State-of-the-art systems of Pedestrian Detection Resolution Speed HoG / Linear SVM Higher Low Wavelet-based AdaBoost cascade Lower near real time

Extract Training Sample Flowchart Training Testing Extract Training Sample Multiple Scales Feature Extraction Sliding Window Classifier Selection Feature Extraction Validation Classify

Extract Training Sample Feature Extraction Classifier Selection Validation Extract Training Sample Dataset : Daimler-DB Positive Negative Test source : Monocular Pedestrian Detection Survey and Experiments, PAMI 2009

Extract Training Sample Feature Extraction Classifier Selection Validation Extract Training Sample source :Pedestrian Detection: An Evaluation of the State of the Art, PAMI 2012

Statics of pedestrian dataset a variety of statistics on current pedestrian detection dataset : height distribution, position distribution, occlusion distribution source :Pedestrian Detection: An Evaluation of the State of the Art, PAMI 2012

HoG + Linear SVM

Extract Training Sample Feature Extract Feature Extraction HoG ( Histogram of Orientated Gradient) Classifier Selection Introduce HoG Validation

HoG Gradient Kernel Magnitude Orientation v h 1 -1 -1 1 -1 -1 1 How to calculate gradient ? 1 use two kernels to decide the magnitudes of horizontal and vertical. 2 Then use simple calculus to decide the magnitude and orientation v h

Figure from : Dr. Edgar Seemann HoG Orientation Histogram 9 bins for gradient orientations (0 - 180 degrees) Cell size : typically 8x8 pixels number of cells : 16x8 Quantization Number is not important, because these are adjustable. Based on your application Figure from : Dr. Edgar Seemann

Figure from : Dr. Edgar Seemann HoG Block Normalization Normalize histograms within overlapping blocks of cells(typically 2x2 cells, i.e. 7x15 blocks) Quantization Number is not important, because these are adjustable. Based on your application Figure from : Dr. Edgar Seemann

HoG Block Normalization Normalize histograms within overlapping blocks of cells(typically 2x2 cells, i.e. 7x15 blocks) feature = 9 *4 HoG =[ 1st rectangles] Quantization Number is not important, because these are adjustable. Based on your application

HoG Block Normalization Normalize histograms within overlapping blocks of cells(typically 2x2 cells, i.e. 7x15 blocks) feature = 9 *4 HoG =[ 1st rectangles; 2st rectangles] Quantization Number is not important, because these are adjustable. Based on your application

HoG Block Normalization Normalize histograms within overlapping blocks of cells(typically 2x2 cells, i.e. 7x15 blocks) feature = 9 *4 HoG =[ 1st rectangles; 2st rectangles; … 7x15 rectangles;] Quantization Number is not important, because these are adjustable. Based on your application

Figure from : Dr. Edgar Seemann HoG Concatenate features Each cell -> 9 Each block ( 2x2 cell) -> 2 * 2* 9 = 36 Total blocks (7x15) -> 7*15*36 = 3780 Talk about feature dimension and computation efficiency problem ! Figure from : Dr. Edgar Seemann

Extract Training Sample Classifier Selection Feature Extraction Linear SVM Classifier Selection Introduce HoG Validation source :Pedestrian Detection: An Evaluation of the State of the Art, PAMI 2012

The process of Detection Multiple Scales Sliding Window Feature Extraction Classify

The process of Detection Multiple Scales Generate multiple images on different scales Sliding Window Why need this step ? Size invariant Feature Extraction Classify Figure from : Pedestrian detection at 100 frames per second

The process of Detection Multiple Scales use sliding window on all locations Sliding Window Feature Extraction Classify Figure from : Pedestrian detection at 100 frames per second

The process of Detection Multiple Scales use sliding window on all locations Sliding Window Feature Extraction Classify Figure from : Pedestrian detection at 100 frames per second

The process of Detection Multiple Scales use sliding window on all locations Sliding Window Feature Extraction Classify Figure from : Pedestrian detection at 100 frames per second

The process of Detection Multiple Scales use sliding window on all locations normal image size : 480*640 Sliding Window Feature Extraction Classify Figure from : Pedestrian detection at 100 frames per second

The process of Detection Multiple Scales Extract HoG from current window Sliding Window Feature Extraction Classify Figure from : Pedestrian detection at 100 frames per second

The process of Detection Multiple Scales Use LinearSVM to classify Accurate Criteria Normally if >= 0.5, we consider it is correct. Sliding Window intersection over union Feature Extraction Classify Figure from : Pedestrian detection at 100 frames per second

Wavelet-based AdaBoost cascade

Haar Wavelet A set of predefined wavelets 40 80

= sum(white) - sum(black) Haar Wavelet A set of predefined wavelets Use the predefined wavelet to extract feature! 40 = sum(white) - sum(black) 80

= sum(white) - sum(black) Haar Wavelet A set of predefined wavelets Use the predefined wavelet to extract feature! 40 = sum(white) - sum(black) 80

Haar Wavelet Calculate on different location 40 40 80 80

Haar Wavelet Calculate on different location 40 40 80 80

Problem ? Length of Features For example : 14 waveslets* 40x80 locations * 3scales * 3channels = 403,200 Curse of high dimensionality How do we solve this problem ? The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces (often with hundreds or thousands of dimensions) that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. Cited from Wikipedia

Classifier : AdaBoost Feature Selection in Training Randomly Sample a set of features A weighted linear combination of selected features Only select a subset of features ! How to select? Should we select the best N features? No, normally a lot of features are overlapped

Speed up the feature extraction : Integral Images More efficient way to calculate the sum of window Integral Image : value at (x, y) = sum of all the pixels above and to the left. How to calculate D ? Answer : 4+1 - (2+3) Source from : Robust Real-Time Face Detection

Cascade Model To handle sliding window problems h*w resolution, s scales e.g. at run time, we need to do O(h*w*s) class- action

Cascade Model False Input Image … True Basic Idea : At Stage 1 is there a pedestrian in the current window? Yes -> go to stage 2 No -> this window does not contain a pedestrian False Input Image AdaBoost Classifier 1 True AdaBoost Classifier 2 True True AdaBoost Classifier N … True

Cascade Model False Input Image … True Basic Idea : At Stage 2 is there a pedestrian in the current window? Yes -> go to stage 3 No -> this window does not contain a pedestrian False Input Image AdaBoost Classifier 1 True AdaBoost Classifier 2 True True AdaBoost Classifier N … True

Cascade Model False Input Image … True Basic Idea : At Stage N (last one) is there a pedestrian in the current window? Yes -> Contain a pedestrian No -> this window does not contain a pedestrian False Input Image AdaBoost Classifier 1 True AdaBoost Classifier 2 True True AdaBoost Classifier N … True

Cascade Model : Training At Stage 1 initial N pedestrian training samples Randomly sample N nonpedestrian samples Train criteria -> 50% false positive, at 99.5% detection rate False AdaBoost Classifier True AdaBoost Classifier True True AdaBoost Classifier … True

Cascade Model : Training At Stage 2 same N pedestrian training samples Collect a set of N false positives of the cascade up to the previous layers Train criteria -> 50% false positive, at 99.5% detection rate False AdaBoost Classifier True AdaBoost Classifier True True AdaBoost Classifier … True

Cascade Model Idea The first stages prune most of the false positives. Average computation time is significantly reduced More efficiency in testing time.

State-Of-the-Arts Pedestrian detection at 100 frames per second, CVPR 2012 Extract the learned features more efficient FAST , 100 frames per second Use CPU+GPU https://www.youtube.com/watch?v=htLRKbVaSDw

More Complicated features from Haar Wavelets Channels talk about more complicated wavelets

N Models, 1 image scale scales N is usually in the order of ~50 scales Need a lot of data for this method! overfitting

1 model, N image scales training a canonical scale is delicate how to decide the size? high resolution scales and blurry low scales Recompute features multiple times in runtime High resolution image and blurry low resolution image will exist and same time

1 model, N/K image scales rescale at feature space directly. HoG cannot

N/K model, 1 image scales Get approximated models quickly for example, N=50, K=10 5 models *K = 50 Instead of retrained models,they synthesize more models from current 5 models

The weaker learner In Random Forests or Adaboost A weak learner consists of a channel rectangles(size, location) a decision threshold

The weaker learner After rescaling S A weak learner consists of a channel -> same rectangles ->scale by S a decision threshold ->scale by r(S) Approximation get new weak learner by approximation quickly

Ideas Use features which can be rescaling directly Get multiple classifier models by synthesizing from few models

What’s more ? Different features Current HoG is dominant, people tries to concatenate this with other features Extract HoG on different color space (LUV) Learned from neural networks(deep learning)

What’s more ? Different Classifiers NonLinear SVM ? too slow in testing time Efficient kernel in testing time? Random Forests Neural Network(Deep Learning) random forests can do several computer vision problems in real time. Kinect

What’s more ? Reduce number of windows Cascade models BING: Binarized Normed Gradients for Objectness Estimation at 300fps, CVPR 2014

More about current status Monocular Pedestrian Detection: Survey and Experiments, Markus Enzweiler, and Dariu M. Gavrila, PAMI 2009 Pedestrian Detection: An Evaluation of the State of the Art, , Piotr Doll ´ ar, Christian Wojek, Bernt Schiele, and Pietro Perona, PAMI 2012

Thank you for your Attention & Questions