Local Features and Kernels for Classification of Object Categories J. Zhang --- QMUL UK (INRIA till July 2005) with M. Marszalek and C. Schmid --- INRIA.

Slides:



Advertisements
Similar presentations
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Advertisements

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Classification using intersection kernel SVMs is efficient Joint work with Subhransu Maji and Alex Berg Jitendra Malik UC Berkeley.
Multi-layer Orthogonal Codebook for Image Classification Presented by Xia Li.
MIT CSAIL Vision interfaces Approximate Correspondences in High Dimensions Kristen Grauman* Trevor Darrell MIT CSAIL (*) UT Austin…
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features Kristen Grauman Trevor Darrell MIT.
CS395: Visual Recognition Spatial Pyramid Matching Heath Vinicombe The University of Texas at Austin 21 st September 2012.
1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.
Lecture 31: Modern object recognition
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Ziming Zhang *, Ze-Nian Li, Mark Drew School of Computing Science, Simon Fraser University, Vancouver, B.C., Canada {zza27, li, Learning.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.
Groups of Adjacent Contour Segments for Object Detection Vittorio Ferrari Loic Fevrier Frederic Jurie Cordelia Schmid.
Fast intersection kernel SVMs for Realtime Object Detection
Discriminative and generative methods for bags of features
Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
Lecture 28: Bag-of-words models
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
1 Invariant Local Feature for Object Recognition Presented by Wyman 2/05/2006.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Machine learning & category recognition Cordelia Schmid Jakob Verbeek.
Overview Introduction to local features
The Beauty of Local Invariant Features
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest Tsz-Ho Yu, Tae-Kyun Kim and Roberto Cipolla Machine Intelligence Laboratory,
Bag-of-Words based Image Classification Joost van de Weijer.
Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.
Action recognition with improved trajectories
Classification 2: discriminative models
Learning Visual Bits with Direct Feature Selection Joel Jurik 1 and Rahul Sukthankar 2,3 1 University of Central Florida 2 Intel Research Pittsburgh 3.
Overview Harris interest points Comparing interest points (SSD, ZNCC, SIFT) Scale & affine invariant interest points Evaluation and comparison of different.
Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.
“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)
Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
A Statistically Selected Part-Based Probabilistic Model for Object Recognition Zhipeng Zhao, Ahmed Elgammal Department of Computer Science, Rutgers, The.
Svetlana Lazebnik, Cordelia Schmid, Jean Ponce
SVM-KNN Discriminative Nearest Neighbor Classification for Visual Category Recognition Hao Zhang, Alex Berg, Michael Maire, Jitendra Malik.
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Evaluation of Research Theme CogB. Objectives LEAR: LEArning and Recognition in vision Visual recognition and scene understanding –Particular objects.
A Sparse Texture Representation Using Affine-Invariant Regions Svetlana Lazebnik, Jean Ponce Svetlana Lazebnik, Jean Ponce Beckman Institute University.
Local invariant features Cordelia Schmid INRIA, Grenoble.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Methods for classification and image representation
A feature-based kernel for object classification P. Moreels - J-Y Bouguet Intel.
First-Person Activity Recognition: What Are They Doing to Me? M. S. Ryoo and Larry Matthies Jet Propulsion Laboratory, California Institute of Technology,
Overview Introduction to local features Harris interest points + SSD, ZNCC, SIFT Scale & affine invariant interest point detectors Evaluation and comparison.
Object Recognition as Ranking Holistic Figure-Ground Hypotheses Fuxin Li and Joao Carreira and Cristian Sminchisescu 1.
Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.
Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.
Presented by David Lee 3/20/2006
Goggle Gist on the Google Phone A Content-based image retrieval system for the Google phone Manu Viswanathan Chin-Kai Chang Ji Hyun Moon.
Does one size really fit all? Evaluating classifiers in a Bag-of-Visual-Words classification Christian Hentschel, Harald Sack Hasso Plattner Institute.
NICTA SML Seminar, May 26, 2011 Modeling spatial layout for image classification Jakob Verbeek 1 Joint work with Josip Krapac 1 & Frédéric Jurie 2 1: LEAR.
Lecture IX: Object Recognition (2)
Object detection with deformable part-based models
Presented by David Lee 3/20/2006
Learning Mid-Level Features For Recognition
Recognizing Deformable Shapes
Paper Presentation: Shape and Matching
Digit Recognition using SVMS
By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,
Finding Clusters within a Class to Improve Classification Accuracy
CS 1674: Intro to Computer Vision Scene Recognition
SIFT keypoint detection
Presentation transcript:

Local Features and Kernels for Classification of Object Categories J. Zhang --- QMUL UK (INRIA till July 2005) with M. Marszalek and C. Schmid --- INRIA France S. Lazebnik and J. Ponce --- UIUC USA

Motivation Why? Describe images, e.g. textures or categories, with sets of sparse features Handle object images under significant viewpoint changes Find better kernels Resulting Stronger robustness and higher accuracy Better kernel evaluation

Outline Bag of features approach Region extraction, description Signature/histogram Kernels and SVM Implementation choice evaluation Detectors & descriptors, invariance, kernels Influence of backgrounds Spatial bag of features Spatial pyramid Results on Pascal 06 validation data

Image Representation Sparse: regions from interest points — Harris-Laplace: H  HS, HSR and HA — Laplacian: L  LS, LSR and LA — Invariance: S (scale), SR (scale and rotation), and A (affine) Dense: regions from points on a fixed grid — Multi-scale -- fixed grid (5000 points per image) Description — SIFT (Lowe, IJCV 2004) and SPIN images (Lazebnik et al, PAMI 2005) Harris-LaplaceLaplacian

Image signature/Histogram Signature: cluster each image  Earth Movers Distance (EMD) (Rubner et al. IJCV2000) and  is the ground truth distance;  is the flow Visual words: cluster all training images Histogram-based distances  Euclidean distance  distance  histogram intersection  any other histogram distances

Kernel-based Classification Kernelization:  Traditional kernel : linear, RBF  Extended Gaussian kernel  Resulting : EMD or kernel Combination:  Direct sum  A two-layer SVM classifier: SVM+ kernel  SVM + RBF kernel Classification: SVM  Binary SVM (Object prediction)  Multi-Class SVM (multi-class classification)

Method Flow Chart Bag of features + SVM classifer [Zhang, Marszalek, Lazebnik & Schmid, Workshop CVPR 2006]

Outline Bag of features approach Region extraction, description Signature/histogram Kernels and SVM Implementation choice evaluation Detectors & descriptors, invariance, kernels Influence of backgrounds Spatial bag of features Spatial pyramid Results on Pascal 06 validation data

UIUC texture : 25 classes, 40 samples each

Xerox7, 7categories, various background bikes booksbuildingcarspeoplephonestrees

Evaluation: Detectors and Descriptors ChannelsSIFTSPINSIFT+SPIN HS92.0 ± ± ±2.1 LS93.9 ± ± ±0.9 HS+LS94.7 ± ± ±1.1 Xerox7: 10 fold cross validation LS better than HS – more points The combination of detectors is the best choice Lap with SIFT is acceptable with less computational cost ChannelsSIFTSPINSIFT+SPIN HSR97.1 ± ± ±0.6 LSR97.7 ± ± ±0.6 HSR+LSR98.0 ± ± ±0.5 UIUCTex : 20 training images per class

Evaluation: Invariance Datasets n Scale InvarianceScale and RotationAffine Invariance HSLSHS+LSHSRLSRHSR+LSRHALAHA+LA UIUCTex ± ± ± ± ± ± ± ± ±0.6 Xerox7 10 fold 92.0± ± ± ± ± ± ± ± ±1.8 Best invariance level depends on datasets Scale invariance is often sufficient for object categories Affine invariance is rarely an advantage

Evaluation: Kernels DatasetsTraining images per class Spares representation Vocabulary-HistogramSignature LinearPoly (n=2) RBF kernelEMD+KNNEMD-kernel UIUCTex2097.0± ± ± ± ± ±0.6 Xerox710 fold79.8± ± ± ± ± ±1.7 EMD and kernel gives the best/comparable results Higher vocabulary usually gives higher accuracy: gives 93% on Xerox7 when using 1000 instead of 70. Experimental setting: sig. size: 40 for EMD; number of clusters per class is 10 for kernel

Comparison with state-of-the-art Methods Xerox7CalTech6GrazPascal05 Test 1 Pascal 05 Test 2 CalTech10 1 (HS+LS)(SIFT+SPIN) Others82.0 Csurka.et al. (ICPR 2004) 96.6 Csurka.et al (ICPR 2004) 83.7 Opelt et al eccv Jurie and Triggs iccv Deselares et al cvpr Grauman and Darrel iccv 05 Results are the mean values of the accuracies Better reults on 5 datasets and comparable results on Pascal05 test set 1

Influence of Background Questions: Background correlations? Do background features make recognition easier? What kinds of backgrounds are best for training? Test Bed: PASCAL 05 Protocol: Use bounding rectangles to separate foreground features (FF) from background features (BF) Introduce two additional background sets: Randomly shuffle backgrounds among all images (BF- RAND) Constant natural scene (BF-CONST)

Influence of Background Train/Test: BF/BF Train/Test:. /.Train/Test:. /AF Train/Test: AF/.

Conclusions on influence of background Backgrounds do have correlations with the foreground objects, but adding them does not result in better performance for our method It is usually beneficial to train on a harder training set Classifier trained on uncluttered or monotonous background tend to overfit Classifiers trained on harder ones generalize well Add random background clutter to training data if backgrounds may not be representative of test set  Based on these results, we include the hard examples marked with 0 for training in PASCAL’06

Outline Bag of features approach Region extraction, description Signature/histogram Kernels and SVM Implementation choice evaluation Detectors & descriptors, invariance, kernels Influence of backgrounds Spatial bag of features Spatial pyramid Results on Pascal 06 validation data

Spatial Pyramid for Bag of Features Pyramid comparison A two-layer SVM classifier: first layer: ; second: RBF Spatial pyramid matching kernel [Lazebnik, Schmid & Ponce, CVPR 2006]

Spatial Pyramid Matching kernel Histogram intersection at level l Spatial pyramid matching kernel – mercer kernel

PASCAL06 Experimental Settings Regions: Sparse: HS+LS Dense: Multi-scale, fixed grid, 5000 points per image Kmeans -- cluster the descriptors of each class separately and then concatenate them, 300 clusters per class 3000 visual words for sparse and dense representations. Kernels: kernel, spatial pyramid kernel Bag of features: (HS+LS)(SIFT) denoted as HS+LS combined with a two-layer SVM classification strategy. Spatial pyramid train SVM for each spatial level, and then using the two-layer SVM classification strategy to combine them spatial pyramid matching kernel. levels up to 2 Classification: binary SVM with output normalized to [0, 1] by (x-min)/(max-min)

Methods Summary (HS+LS): the bag of keypoints method with a two-layer SVM classifier (LS)(PMK): Laplacian points with spatial pyramid matching kernel (DS)(PMK): Multi-scale dense points with spatial pyramid matching kernel (DS)(PCh): Multi-scale dense points with a two-layer spatial pyramid SVM (LS)(PCh): Laplacian points with a two-layer spatial pyramid SVM (LS) : Laplacian points with a kernel (HS+LS)(SUM): the bag of keypoints method with a SUM of the distances

AUC for VOC Validation Set Methods HS+LS(LS)(PCh)(DS)(PCh)(LS)(PMK)(DS)(PMK)LSHS+LS (SUM) Bicycle Bus Car Cat Cow Dog Horse Motorbike Person Sheep Av (HS+LS), (LS)(PCh) the best A two-layer SVM classifier better than spatial pyramid kernel : (LS)(PCh) > (LS)(PMK); DS(PCh) > (DS)(PMK) Spatial information helps a bit (LS)(PCh) >= LS

AUC Measures PMK (Levels 0,1,2) (LS)(PMK) L0L1L2L0L1L2L0L1L2 Bicycle Bus Car Cat Cow Dog Horse Motorbike Person Sheep Av L2 usually better than L1, L0 with the same vocabulary Larger vocabulary less improvement Comparable with LS – bag of features with sufficient vocabulary

Conclusions Our approaches give excellent results -- (HS+LS), (LS)(PCh) the best Sparse (interest points sampling) rep. performs better than dense rep. (4830 vs. 5000) A two-layer spatial SVM classifier gives slightly better results than pyramid matching kernel Spatial constrains help classification, however, perform similarly to bag of features with a sufficient large vocabulary in the context of PASCAL ’ 06