Sreekanth Vempati ( ) Advisors: Dr. C. V. Jawahar ( IIIT Hyderabad ), Dr. Andrew Zisserman ( Univ. of Oxford ) Efficient SVM based object classification and detection 114 December 2010
Large Visual Data 2 Cheap capturing, storage and internet devices
Rapid 3 Video sharing Image sharing Rapid growth in the amount of data available In the case of youtube
Problems Scene/Object Classification – Find specified categories of scenes/objects 4 Is there a bus in this image? Output the bounding box of the bus in this image Object Detection – Find the location of specified categories of scenes/objects Is there a a demonstration/protest in this image?
Challenges 5 Intra class variations Inter class similarity Ex: Boat/Ship category Protest FlowersCityscape
Challenges 6 View Point variation Occlusions/Truncations
Scalability We need solutions which can be scalable to large amount of data For example, if we have to test 1,40,000 images For best performance – Feature representation (Visual words based) 6300 dimensions takes ~50 seconds ->total time would be ~57 days – Classification (SVM with non-linear kernel) 20 classes 3 images/second, a total time of ~ 10 days 7
Overview Large scale semantic concept retrieval in videos Modeling subcategories Efficient detection by using GRBF feature maps Conclusions 8
1. Semantic video retrieval 9 Given a large set of videos, retrieve the videos of specific category – Ex: Find all the videos containing soccer
Training Testing Overview of the approach Feature Extraction Ex: PHOW, PHOG, GIST Classifier Ex: SVM, Random Forests Annotated Video Frames Unseen Videos Example Videos Feature Extraction Ranked Shots 10
Features GIST – Torralba et. al IJCV 01 – Image divided into m x m grid – For each cell, a set of filters (different scales, orientations) are applied – Final descriptor: Average of the filter responses over all blocks 11 Images from “Image Classification for large number of object categories”, Anna Bosch, 2006
Features 12 Images from “Image Classification for large number of object categories”, Anna Bosch, 2006 Pyramid Histogram of Oriented Gradients
Vector Quantization Pyramid Histogram of Visual Words Scale Invariant Feature Transform 13 “Beyond bag of features: Spatial pyramid matching for recognizing natural scene categories.”, S. Lazebnik et. al CVPR 2006 Using dense SIFT descriptors
Support Vector Machines (SVM) w t (x) + b = 0 w t (x) + b = +1 w t (x) + b = -1 b w Support Vector Misclassified point = 0 < 1 X i i = 1,..…..,N y i i = 1,……,N
SVM formulation Evaluation function f(x) = w t x + b
Kernel Trick Use a function which maps input space to feature space. And then build the classifier in feature space.
Dot product in feature space Moving to different space f(x) = w t x + b = i i y i + b
Replace it with kernel function Kernelizing SVMs
Kernels Linear : Polynomial : Intersection kernel Generalized RBF kernel : Weighted combination of multiple kernels
TRECVID competition Objective : Rank video shots based on the presence of given concept Participated in High level feature extraction, TRECVID Organized by NIST, USA 2008: around 180 submissions by 40 teams from all over the world 20
Some of the classes High-level Feature Extraction o Mountain o Hand o Street o Telephone o Flower o Bridge o Airplane flying o Boat/Ship o Bus o Dog o Cityscape o Classroom o Driver o Two People o Emergency Vehicle o Harbor o Kitchen o Nighttime o Singing o Demonstration/Protest 21
Data Statistics 22 Evaluation Measure Average Precision - Area under Precision-Recall curve
Our Approach Performance compared using different features and SVM parameters – Use of PHOW with Intersection kernel is efficient – Testing is very fast, with little drop in performance Testing time: ~2lakh frames in 10 seconds 23 “Classification using Intersection kernel SVMs is efficient”, A. Berg et. al, CVPR 2009
Variation with features 24
Variation with kernels 25
Results 26 More Results
1. Summary Method of visual concept retrieval suitable for large scale data PHOW with fast intersection kernel is very much useful 27
2. Modeling subcategories 28
Subcategories in real world 29
What we achieved? 30
Structural SVM vs SVM 31 - Joint feature map between input and output -Allows the output label to be a complex variable -Our case: Use as a combination of category and subcategory labels “Support Vector Learning for Interdependent and Structured Output Spaces”, I. Tsochantaridis,, et. al ICML 04
Use of latent variables 32 “Learning structural SVMs with latent variables”, C. N. Yu et. al ICML 2009
Toy Datasets 33
Real world datasets 34 TRECVID 2009 dataset PASCAL VOC (Visual Object Categorization) 2007 – Object Detection dataset
Results on TRECVID dataset 35
Improvement with latent SVM 36
Effect of no. of subclasses 37
2. Summary Method for modeling of subcategories using structural SVM Application of latent structural SVM for further improvements Improved the performance of linear kernel Performed various experiments on toy and real data 38
3. Generalized RBF feature maps for Efficient Detection 39
Object Detection aeroplane horse bicycle car cow motorbike 40
Part3: Outline Introduction: Kernels and Feature maps Explicit feature maps for GRBF kernels Experiments & Results 41
General Framework for detection Feature representations Classifier (Ex: SVM ) “Multiple Kernel Learning for Object Detection”, Vedaldi et. al, ICCV 2009, “Cascade Object Detection with Deformable Part Models”, Felzenszwalb et. al, CVPR 2010, Ex: Car Non-linear SVM Linear SVM Any Image 42
Linear SVM Additive kernels Generalized RBF kernels Ex: intersection Kernel Ex: exp- kernel Kernels faster more discriminative Fast Linear SVMs Stochastic SVM ( PEGASOS ) Primal SVM (liblinear) One-slack SVM ( SVM-perf ) 43
Kernels Problem: GRBF kernels with high computational complexity are required to get good performance Our Solution: Approximate Generalized RBF kernels with a linear one by using a feature map 44
A kernel is a dot product in a high dimensional feature space Define a feature map approximating the kernel Speeding up non-linear SVMs 45
Explicit feature maps Feature maps for RBF/multiplicative kernels – [Rahmi and Recht, NIPS 07] – [ F. Li et. al DAGM 2010] Feature maps for additive kernels – [Maji and Berg, ICCV 09] – [Vedaldi and Zisserman, CVPR 2010] – [Perronin, et. al CVPR 2010] Our Contribution Feature maps for generalized RBF kernels 2X to 3X speedup (only a little drop in performance) 46
Part3: Outline Introduction: Kernels and Feature maps Explicit feature maps for GRBF kernels Experiments & Results 47
Additive kernels Examples:, Intersection Hellinger’s, kernel 48
Additive Kernel Maps approximated by sampling Feature maps for additive kernels [ Vedaldi & Zisserman 10 ]: closed form function “Efficient Additive Kernels via Explicit Feature Maps”, A. Vedaldi and A. Zisserman, CVPR
Random Fourier features [Rahimi & Recht 07] Feature maps for RBF kernels “Random Features for Large-Scale Kernel Machines”, Ali Rahimi, Ben Recht NIPS
Generalized RBF kernels Definition Trick: In terms of feature map Example for distance: kernel distance 51
GRBF feature maps algorithm 52
Outline Introduction: Kernels and Feature maps Explicit feature maps for generalized RBF kernels Experiments & Results 53
Experimental Setup PASCAL VOC (Visual Object Categorization) 2007 – Object Detection dataset – 20 object categories Cascade of classifiers [Vedaldi et. al ICCV 2009] exp-chi2 kernel SVM Linear SVM Additive Kernel SVM PHOG features exp-chi2 feature map SVM 54
Approximate vs Exact Kernels Average precision but testing time increases with number of projections. 55
Approximate vs Exact Kernels Average Precision & Testing Time increase with number of projections 56
Large number of projections required for good performance Additional improvement in testing time SVM SPARSE L1-Regularized L2 - loss function LR SPARSE L1-Regularized Logistic Regression –loss function 57
Speedup with l 1 regularization 58
Effect of C on Sparsity Smaller C gives a sparser solution with only a slight drop in performance Parameter to control sparsity: SVM parameter C Recall SVM objective function 59
Example Results 60
Results on all the 20 categories SVM dense is faster than exact exp- and performs better than 61
Results on all the 20 categories LR sparse is 2 to 3 times faster than exact exp- and performs better than 62
3. Summary Feature maps for generalized RBF kernels Method for reducing the number of projections Results on VOC 2007: – nearly 2x to 3x speedup with a slight loss in performance 63
Conclusions Proposed efficient methods based on SVM for visual scene/object categorization and detection Validated these methods on a large amount of data Further: Porting these techniques on to GPUs, including time information for improvement of average precision. 64 AeroplaneMotorbike
Publications Generalized RBF feature maps for efficient detection, Sreekanth Vempati, Andrea Vedaldi, Andrew Zisserman, C. V. Jawahar 21st British Machine Vision Conference (BMVC), 2010 (Oral Presentation), Aberystwyth, UK 2009Oxford/IIIT - TRECVID Notebook paper, Sreekanth Vempati, Mihir Jain, Omkar M. Parkhi, C. V. Jawahar, Andrea Vedaldi, Marcin Marszalek, Andrew Zisserman TRECVID 2009 Workshop, Gaithersburg, Md., USA. 2008Oxford/IIIT - TRECVID Notebook paper, James Philbin, Manuel Marin-Jimenez, Siddharth Srinivasan and Andrew Zisserman, Mihir Jain, Sreekanth Vempati, Pramod Sankar and C. V. Jawahar TRECVID 2008 Workshop, Gaithersburg, Md., USA. 65
Thank You 66
Object Detection Should we put our results or this groundtruth? 67
GRBF-Algorithm 1.Compute the approximate feature map corresponding to the additive kernel 68
GRBF-Algorithm 1.Compute the approximate feature map corresponding to the additive kernel 2.Compute the RBF feature map using as the input vector 69
Choice of features PHOG features are used in our experiments –exp- performs better than Using Exact Kernels PHOG PHOW 70
Sparsity vs Performance 71