Ivan Laptev IRISA/INRIA, Rennes, France September 07, 2006 Boosted Histograms for Improved Object Detection
[Swain & Ballard 1991] - Color histograms [Schiele & Crowley 1996] - Receptive field histograms [Lowe 1999] - localized orientation histograms (SIFT) [Schneiderman & Kanade 2000] - localized histograms of wavelet coef. [Leung & Malik 2001] - Texton histograms [Belongie et.al. 2002] - Shape context [Dalal & Triggs 2005] - Dense orientation histograms Remarkable success of recognition methods using histograms of local image measurements: Likely explanation: Histograms are robust to image variations such as limited geometric transformations and object class variability. Histograms for object recognition
Histograms What to measure? No guarantee for optimal recognition Different regions may have different discriminative power Color [SB91] Gaussian derivatives [SC96] Wavelet coeff. [SK00] Textons [LM01] Gradient orientation [L99,DT05] Where to measure? A B C D A B C D Whole image [SB91,SC96] Pre-defined grid [SK00,BMP02,DT05] Key points [L99] Histograms: What vs. Where
Efficient discriminative classifier [Freund&Schapire’97] Good performance for face detection [Viola&Jones’01] Idea boosting selected features weak classifier AdaBoost: Haar features Histogram features SVM Neural Networks Too heavy
Possible approach: Example 1: Weak learner 1-dim. projections onto predefined vectors
Possible approach: Example 2: Weak learner 1-dim. projections onto predefined vectors
feature mean feature covariance Can be modified to minimize the error of weighted samples (required for boosting) Fischer weak learner Alternative approach: Evidence from real image training data: Fischer learner“1-bin” learner Assume Normal distribution of features (hopefully valid at least for some of ~10^5 features!) Compute projection direction by FLD:
Histogram features ~10^5 rectangle features Histograms over 4 gradient orientations, 4 subdivisions for each reactangle
Training data Crop and resize Perturb annotation Increase training set X 10 +
Training: Selected Features 376 of ~10^5 features selected correct classification 10^-5 false positives
Scan and classify image windows at different positions and scales Cluster detections in the space-scale space Assign cluster size to the detection confidence Conf.=5 Object detection
motorbikes bicycles people cars #217 / #220 #123 / #123 #152 / #149 #320 / #341 PASCAL Visual Object Classes Challenge 2005 (VOC’05)
Ground truth annotation Detection results: >50 % overlap of bounding box with GT one bounding box for each object confidence value for each detection Precision-Recall (PR) curve: Average Precision (AP) value: Evaluation criteria Detection results: >50 % overlap of bounding box with GT one bounding box for each object confidence value for each detection Detection results: >50 % overlap of bounding box with GT one bounding box for each object confidence value for each detection Detection results: >50 % overlap of bounding box with GT one bounding box for each object confidence value for each detection
PR-curves for the “Motorbike” validation dataset: [Levi and Weiss, CVPR 2004] “Learning object detection from a small number of examples: The importance of good features” Evaluation of detection FLD learner + 1-bin classifier
Bicycles test1 People test1 cars test1Motorbikes test1 Results for VOC’05 Challenge
Average Precision values: Results for VOC’05 Challenge
PASCAL Visual Object Classes Challenge 2006 (VOC’06)
examples Results for VOC’06 Challenge Competition "comp3" (train on VOC data) Class “bicycle"
examples Results for VOC’06 Challenge Competition "comp3" (train on VOC data) Class “cow"
examples Results for VOC’06 Challenge Competition "comp3" (train on VOC data) Class “horse"
Results for VOC’06 Challenge Competition "comp3" (train on VOC data) Class “motorbike"
Results for VOC’06 Challenge Competition "comp3" (train on VOC data) Class “person"
bicyclebuscarcatcowdoghorsemotorbikepersonsheep Cambridge ENSMP INRIA_Douze INRIA_Laptev TUD TKK Average Precision values: Results for VOC’06 Challenge
All results are obtained with a single set of parameters Small number of training samples is sufficient Efficient detection: 10fps on 320x280 images Extension to texton/color histogram features is straightforward Open questions: Other free-shape regions better? How to find them? Better weak learner that takes advantage of histogram properties View transformations Final Notes
All results are obtained with a single set of parameters Small number of training samples is sufficient Efficient detection: 10fps on 320x280 images Extension to texton/color histogram features is straightforward Open questions: Other free-shape regions better? How to find them? Better weak learner that takes advantage of histogram properties View transformations Final Notes
All results are obtained with a single set of parameters Small number of training samples is sufficient Efficient detection: 10fps on 320x280 images Extension to texton/color histogram features is straightforward Open questions: Other free-shape regions better? How to find them? Better weak learner that takes advantage of histogram properties View transformations Final Notes
All results are obtained with a single set of parameters Small number of training samples is sufficient Efficient detection: 10fps on 320x280 images Extension to texton/color histogram features is straightforward Open questions: Other free-shape regions better? How to find them? Better weak learner that takes advantage of histogram properties View transformations Final Notes
All results are obtained with a single set of parameters Small number of training samples is sufficient Efficient detection: 10fps on 320x280 images Extension to texton/color histogram features is straightforward Open questions: Other free-shape regions better? How to find them? Better weak learner that takes advantage of histogram properties View transformations Final Notes Detection tasks in VOC05,VOC06 are far from being solved, it is a challenge!