Sreekanth Vempati ( 200402044 ) Advisors: Dr. C. V. Jawahar ( IIIT Hyderabad ), Dr. Andrew Zisserman ( Univ. of Oxford ) Efficient SVM based object classification.

Slides:

Advertisements

Similar presentations

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Advertisements

Three things everyone should know to improve object retrieval

On-the-fly Specific Person Retrieval University of Oxford 24 th May 2012 Omkar M. Parkhi, Andrea Vedaldi and Andrew Zisserman.

A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle *, Congcong Li *, Ashutosh Saxena, and Tsuhan Chen Cornell University,

Classification using intersection kernel SVMs is efficient Joint work with Subhransu Maji and Alex Berg Jitendra Malik UC Berkeley.

Multi-layer Orthogonal Codebook for Image Classification Presented by Xia Li.

1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.

Lecture 31: Modern object recognition

Many slides based on P. FelzenszwalbP. Felzenszwalb General object detection with deformable part-based models.

Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.

Data-driven Visual Similarity for Cross-domain Image Matching

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.

SUPER: Towards Real-time Event Recognition in Internet Videos Yu-Gang Jiang School of Computer Science Fudan University Shanghai, China

Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.

Fast intersection kernel SVMs for Realtime Object Detection

Discriminative and generative methods for bags of features

Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

Local Features and Kernels for Classification of Object Categories J. Zhang --- QMUL UK (INRIA till July 2005) with M. Marszalek and C. Schmid --- INRIA.

Spatial Pyramid Pooling in Deep Convolutional

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Machine learning & category recognition Cordelia Schmid Jakob Verbeek.

Generic object detection with deformable part-based models

Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.

Classification 2: discriminative models

Object Bank Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 4 th, 2013.

Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.

Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.

Jifeng Dai 2011/09/27.  Introduction  Structural SVM  Kernel Design  Segmentation and parameter learning  Object Feature Descriptors  Experimental.

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 24 – Classifiers 1.

Svetlana Lazebnik, Cordelia Schmid, Jean Ponce

Object Detection with Discriminatively Trained Part Based Models

Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.

Pedestrian Detection and Localization

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

Deformable Part Model Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 11 st, 2013.

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.

Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.

Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.

Locality-constrained Linear Coding for Image Classification

Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Methods for classification and image representation

CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.

Recognition Using Visual Phrases

Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.

Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Goggle Gist on the Google Phone A Content-based image retrieval system for the Google phone Manu Viswanathan Chin-Kai Chang Ji Hyun Moon.

CS 2750: Machine Learning Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh February 17, 2016.

NICTA SML Seminar, May 26, 2011 Modeling spatial layout for image classification Jakob Verbeek 1 Joint work with Josip Krapac 1 & Frédéric Jurie 2 1: LEAR.

Recent developments in object detection

Object detection with deformable part-based models

Data Driven Attributes for Action Detection

Learning Mid-Level Features For Recognition

Performance of Computer Vision

Paper Presentation: Shape and Matching

ICCV Hierarchical Part Matching for Fine-Grained Image Classification

Training Techniques for Deep Neural Networks

Digit Recognition using SVMS

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

CS 1674: Intro to Computer Vision Scene Recognition

Random feature for sparse signal classification

CS 2750: Machine Learning Support Vector Machines

“The Truth About Cats And Dogs”

Presentation transcript:

Sreekanth Vempati ( ) Advisors: Dr. C. V. Jawahar ( IIIT Hyderabad ), Dr. Andrew Zisserman ( Univ. of Oxford ) Efficient SVM based object classification and detection 114 December 2010

Large Visual Data 2 Cheap capturing, storage and internet devices

Rapid 3 Video sharing Image sharing Rapid growth in the amount of data available In the case of youtube

Problems Scene/Object Classification – Find specified categories of scenes/objects 4 Is there a bus in this image? Output the bounding box of the bus in this image Object Detection – Find the location of specified categories of scenes/objects Is there a a demonstration/protest in this image?

Challenges 5 Intra class variations Inter class similarity Ex: Boat/Ship category Protest FlowersCityscape

Challenges 6 View Point variation Occlusions/Truncations

Scalability We need solutions which can be scalable to large amount of data For example, if we have to test 1,40,000 images For best performance – Feature representation (Visual words based) 6300 dimensions takes ~50 seconds ->total time would be ~57 days – Classification (SVM with non-linear kernel) 20 classes 3 images/second, a total time of ~ 10 days 7

Overview Large scale semantic concept retrieval in videos Modeling subcategories Efficient detection by using GRBF feature maps Conclusions 8

1. Semantic video retrieval 9 Given a large set of videos, retrieve the videos of specific category – Ex: Find all the videos containing soccer

Training Testing Overview of the approach Feature Extraction Ex: PHOW, PHOG, GIST Classifier Ex: SVM, Random Forests Annotated Video Frames Unseen Videos Example Videos Feature Extraction Ranked Shots 10

Features GIST – Torralba et. al IJCV 01 – Image divided into m x m grid – For each cell, a set of filters (different scales, orientations) are applied – Final descriptor: Average of the filter responses over all blocks 11 Images from “Image Classification for large number of object categories”, Anna Bosch, 2006

Features 12 Images from “Image Classification for large number of object categories”, Anna Bosch, 2006 Pyramid Histogram of Oriented Gradients

Vector Quantization Pyramid Histogram of Visual Words Scale Invariant Feature Transform 13 “Beyond bag of features: Spatial pyramid matching for recognizing natural scene categories.”, S. Lazebnik et. al CVPR 2006 Using dense SIFT descriptors

Support Vector Machines (SVM) w t (x) + b = 0 w t (x) + b = +1 w t (x) + b = -1 b w Support Vector Misclassified point  = 0  < 1 X i i = 1,..…..,N y i i = 1,……,N

SVM formulation Evaluation function f(x) = w t x + b

Kernel Trick Use a function which maps input space to feature space. And then build the classifier in feature space.

Dot product in feature space Moving to different space f(x) = w t x + b =  i  i y i + b

Replace it with kernel function Kernelizing SVMs

Kernels Linear : Polynomial : Intersection kernel Generalized RBF kernel : Weighted combination of multiple kernels

TRECVID competition Objective : Rank video shots based on the presence of given concept Participated in High level feature extraction, TRECVID Organized by NIST, USA 2008: around 180 submissions by 40 teams from all over the world 20

Some of the classes High-level Feature Extraction o Mountain o Hand o Street o Telephone o Flower o Bridge o Airplane flying o Boat/Ship o Bus o Dog o Cityscape o Classroom o Driver o Two People o Emergency Vehicle o Harbor o Kitchen o Nighttime o Singing o Demonstration/Protest 21

Data Statistics 22 Evaluation Measure Average Precision - Area under Precision-Recall curve

Our Approach Performance compared using different features and SVM parameters – Use of PHOW with Intersection kernel is efficient – Testing is very fast, with little drop in performance Testing time: ~2lakh frames in 10 seconds 23 “Classification using Intersection kernel SVMs is efficient”, A. Berg et. al, CVPR 2009

Variation with features 24

Variation with kernels 25

Results 26 More Results

1. Summary Method of visual concept retrieval suitable for large scale data PHOW with fast intersection kernel is very much useful 27

2. Modeling subcategories 28

Subcategories in real world 29

What we achieved? 30

Structural SVM vs SVM 31 - Joint feature map between input and output -Allows the output label to be a complex variable -Our case: Use as a combination of category and subcategory labels “Support Vector Learning for Interdependent and Structured Output Spaces”, I. Tsochantaridis,, et. al ICML 04

Use of latent variables 32 “Learning structural SVMs with latent variables”, C. N. Yu et. al ICML 2009

Toy Datasets 33

Real world datasets 34 TRECVID 2009 dataset PASCAL VOC (Visual Object Categorization) 2007 – Object Detection dataset

Results on TRECVID dataset 35

Improvement with latent SVM 36

Effect of no. of subclasses 37

2. Summary Method for modeling of subcategories using structural SVM Application of latent structural SVM for further improvements Improved the performance of linear kernel Performed various experiments on toy and real data 38

3. Generalized RBF feature maps for Efficient Detection 39

Object Detection aeroplane horse bicycle car cow motorbike 40

Part3: Outline Introduction: Kernels and Feature maps Explicit feature maps for GRBF kernels Experiments & Results 41

General Framework for detection Feature representations Classifier (Ex: SVM ) “Multiple Kernel Learning for Object Detection”, Vedaldi et. al, ICCV 2009, “Cascade Object Detection with Deformable Part Models”, Felzenszwalb et. al, CVPR 2010, Ex: Car Non-linear SVM Linear SVM Any Image 42

Linear SVM Additive kernels Generalized RBF kernels Ex: intersection Kernel Ex: exp- kernel Kernels faster more discriminative Fast Linear SVMs Stochastic SVM ( PEGASOS ) Primal SVM (liblinear) One-slack SVM ( SVM-perf ) 43

Kernels Problem: GRBF kernels with high computational complexity are required to get good performance Our Solution: Approximate Generalized RBF kernels with a linear one by using a feature map 44

A kernel is a dot product in a high dimensional feature space Define a feature map approximating the kernel Speeding up non-linear SVMs 45

Explicit feature maps Feature maps for RBF/multiplicative kernels – [Rahmi and Recht, NIPS 07] – [ F. Li et. al DAGM 2010] Feature maps for additive kernels – [Maji and Berg, ICCV 09] – [Vedaldi and Zisserman, CVPR 2010] – [Perronin, et. al CVPR 2010] Our Contribution Feature maps for generalized RBF kernels 2X to 3X speedup (only a little drop in performance) 46

Part3: Outline Introduction: Kernels and Feature maps Explicit feature maps for GRBF kernels Experiments & Results 47

Additive kernels Examples:, Intersection Hellinger’s, kernel 48

Additive Kernel Maps approximated by sampling Feature maps for additive kernels [ Vedaldi & Zisserman 10 ]: closed form function “Efficient Additive Kernels via Explicit Feature Maps”, A. Vedaldi and A. Zisserman, CVPR

Random Fourier features [Rahimi & Recht 07] Feature maps for RBF kernels “Random Features for Large-Scale Kernel Machines”, Ali Rahimi, Ben Recht NIPS

Generalized RBF kernels Definition Trick: In terms of feature map Example for distance: kernel distance 51

GRBF feature maps algorithm 52

Outline Introduction: Kernels and Feature maps Explicit feature maps for generalized RBF kernels Experiments & Results 53

Experimental Setup PASCAL VOC (Visual Object Categorization) 2007 – Object Detection dataset – 20 object categories Cascade of classifiers [Vedaldi et. al ICCV 2009] exp-chi2 kernel SVM Linear SVM Additive Kernel SVM PHOG features exp-chi2 feature map SVM 54

Approximate vs Exact Kernels Average precision but testing time increases with number of projections. 55

Approximate vs Exact Kernels Average Precision & Testing Time increase with number of projections 56

Large number of projections required for good performance Additional improvement in testing time SVM SPARSE L1-Regularized L2 - loss function LR SPARSE L1-Regularized Logistic Regression –loss function 57

Speedup with l 1 regularization 58

Effect of C on Sparsity Smaller C gives a sparser solution with only a slight drop in performance Parameter to control sparsity: SVM parameter C Recall SVM objective function 59

Example Results 60

Results on all the 20 categories SVM dense is faster than exact exp- and performs better than 61

Results on all the 20 categories LR sparse is 2 to 3 times faster than exact exp- and performs better than 62

3. Summary Feature maps for generalized RBF kernels Method for reducing the number of projections Results on VOC 2007: – nearly 2x to 3x speedup with a slight loss in performance 63

Conclusions Proposed efficient methods based on SVM for visual scene/object categorization and detection Validated these methods on a large amount of data Further: Porting these techniques on to GPUs, including time information for improvement of average precision. 64 AeroplaneMotorbike

Publications Generalized RBF feature maps for efficient detection, Sreekanth Vempati, Andrea Vedaldi, Andrew Zisserman, C. V. Jawahar 21st British Machine Vision Conference (BMVC), 2010 (Oral Presentation), Aberystwyth, UK 2009Oxford/IIIT - TRECVID Notebook paper, Sreekanth Vempati, Mihir Jain, Omkar M. Parkhi, C. V. Jawahar, Andrea Vedaldi, Marcin Marszalek, Andrew Zisserman TRECVID 2009 Workshop, Gaithersburg, Md., USA. 2008Oxford/IIIT - TRECVID Notebook paper, James Philbin, Manuel Marin-Jimenez, Siddharth Srinivasan and Andrew Zisserman, Mihir Jain, Sreekanth Vempati, Pramod Sankar and C. V. Jawahar TRECVID 2008 Workshop, Gaithersburg, Md., USA. 65

Thank You 66

Object Detection Should we put our results or this groundtruth? 67

GRBF-Algorithm 1.Compute  the approximate feature map corresponding to the additive kernel 68

GRBF-Algorithm 1.Compute  the approximate feature map corresponding to the additive kernel 2.Compute  the RBF feature map using as the input vector 69

Choice of features PHOG features are used in our experiments –exp- performs better than Using Exact Kernels PHOG PHOW 70

Sparsity vs Performance 71