Multi-layer Orthogonal Codebook for Image Classification Presented by Xia Li.

Slides:

Advertisements

Similar presentations

Aggregating local image descriptors into compact codes

Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Three things everyone should know to improve object retrieval

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition Waqas Sultani, Imran Saleemi CVPR 2014.

Clustering with k-means and mixture of Gaussian densities Jakob Verbeek December 3, 2010 Course website:

MIT CSAIL Vision interfaces Approximate Correspondences in High Dimensions Kristen Grauman* Trevor Darrell MIT CSAIL (*) UT Austin…

CS395: Visual Recognition Spatial Pyramid Matching Heath Vinicombe The University of Texas at Austin 21 st September 2012.

1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.

Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.

Ziming Zhang *, Ze-Nian Li, Mark Drew School of Computing Science, Simon Fraser University, Vancouver, B.C., Canada {zza27, li, Learning.

CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Nonlinear Unsupervised Feature Learning How Local Similarities Lead to Global Coding Amirreza Shaban.

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

Discriminative and generative methods for bags of features

Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Beyond bags of features: Part-based models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Lecture 28: Bag-of-words models

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.

Bag-of-features models

Local Features and Kernels for Classification of Object Categories J. Zhang --- QMUL UK (INRIA till July 2005) with M. Marszalek and C. Schmid --- INRIA.

Lecture XI: Object Recognition (2)

Spatial Pyramid Pooling in Deep Convolutional

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Image Classification using Sparse Coding: Advanced Topics

Large Scale Recognition and Retrieval. What does the world look like? High level image statistics Object Recognition for large-scale search Focus on scaling.

Global and Efficient Self-Similarity for Object Classification and Detection CVPR 2010 Thomas Deselaers and Vittorio Ferrari.

Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Exercise Session 10 – Image Categorization

Real-time Action Recognition by Spatiotemporal Semantic and Structural Forest Tsz-Ho Yu, Tae-Kyun Kim and Roberto Cipolla Machine Intelligence Laboratory,

Multiclass object recognition

CSE 185 Introduction to Computer Vision Pattern Recognition.

Keypoint-based Recognition Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/04/10.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

1 Action Classification: An Integration of Randomization and Discrimination in A Dense Feature Representation Computer Science Department, Stanford University.

Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,

Svetlana Lazebnik, Cordelia Schmid, Jean Ponce

Yao, B., and Fei-fei, L. IEEE Transactions on PAMI(2012)

SVM-KNN Discriminative Nearest Neighbor Classification for Visual Category Recognition Hao Zhang, Alex Berg, Michael Maire, Jitendra Malik.

Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.

Group Sparse Coding Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July

Representations for object class recognition David Lowe Department of Computer Science University of British Columbia Vancouver, Canada Sept. 21, 2006.

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.

In Defense of Nearest-Neighbor Based Image Classification Oren Boiman The Weizmann Institute of Science Rehovot, ISRAEL Eli Shechtman Adobe Systems Inc.

Nonlinear Learning Using Local Coordinate Coding K. Yu, T. Zhang and Y. Gong, NIPS 2009 Improved Local Coordinate Coding Using Local Tangents K. Yu and.

Locality-constrained Linear Coding for Image Classification

Fitting: The Hough transform

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.

CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.

Lecture 08 27/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.

CS654: Digital Image Analysis

Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.

Goggle Gist on the Google Phone A Content-based image retrieval system for the Google phone Manu Viswanathan Chin-Kai Chang Ji Hyun Moon.

Lecture IX: Object Recognition (2)

Learning Mid-Level Features For Recognition

Paper Presentation: Shape and Matching

ICCV Hierarchical Part Matching for Fine-Grained Image Classification

By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,

CS 1674: Intro to Computer Vision Scene Recognition

CVPR 2014 Orientational Pyramid Matching for Recognizing Indoor Scenes

A Graph-Matching Kernel for Object Categorization

Presentation transcript:

Multi-layer Orthogonal Codebook for Image Classification Presented by Xia Li

Outline Introduction – Motivation – Related work Multi-layer orthogonal codebook Experiments Conclusion

Image Classification local feature extraction visual codebook construction vector quantization spatial pooling linear/nolinear classifier Dense, uniformlySparse, at interest points For object categorization, dense sampling offers better coverage. [Nowak, Jurie & Triggs, ECCV 2006] Use orientation histograms within sub-patches to build 4*4*8=128 dim SIFT descriptor vector. [David Lowe, 1999, 2004] Image credits: F-F. Li, E. Nowak, J. Sivic Sampling: Descriptor:

Image Classification local feature extraction visual codebook construction vector quantization spatial pooling linear/nolinear classifier Visual codebook construction – Supervised vs. Unsupervised clustering – k-means (typical choice), agglomerative clustering, mean-shift,… Vector Quantization via clustering –Let cluster centers be the prototype “visual words” Descriptor space –Assign the closest cluster center to each new image patch descriptor. Image credits: K. Grauman, B. Leibe

Image Classification local feature extraction visual codebook construction vector quantization spatial pooling linear/nolinear classifier Bags of visual words Represent entire image based on its distribution (histogram) of word occurrences. Analogous to bag of words representation used for documents classification/retrieval. Image credit: Fei-Fei Li

Image Classification local feature extraction visual codebook construction vector quantization spatial pooling linear/nolinear classifier Image credit: S. Lazebnik [S. Lazebnik, C. Schmid, and J. Ponce, CVPR 2006]

Image Classification local feature extraction visual codebook construction vector quantization spatial pooling linear/nolinear classifier  Histogram intersection kernel:  Linear kernel: Image credit: S. Lazebnik

Image Classification local feature extraction visual codebook construction vector quantization spatial pooling linear/nolinear classifier Image credit: S. Lazebnik [S. Lazebnik, C. Schmid, and J. Ponce, CVPR 2006]

Motivation local feature extraction visual codebook construction vector quantization spatial pooling linear/nolinear classifier Codebook quality – Feature type – Codebook creation Algorithm e.g. K-Means Distance metric e.g. L2 Number of words – Quantization process Hard quantization: only one word is assigned for each descriptor Soft quantization: multi-words may be assigned for each descriptor

Motivation local feature extraction visual codebook construction vector quantization spatial pooling linear/nolinear classifier Quantization error – The Euclidean squared distance between a descriptor vector and its mapped visual word Hard quantization leads to large error Effects of descriptor hard quantization – Severe drop in descriptor discriminative power. A scatter plot of descriptor discriminative power before and after quantization. The display is in logarithmic scale in both axes. O. Boiman, E. Shechtman, M. Irani, CVPR 2008

Motivation local feature extraction visual codebook construction vector quantization spatial pooling linear/nolinear classifier Codebook size is an i mportant factor for applications that need efficiency – Simply enlarging codebook size can reduce overall quantization error – but cannot guarantee every descriptor got reduced error codebook sizepercent of descriptors codebook 128 vs. codebook % codebook 128 vs. codebook % The right column is the percentage of descriptors whose quantization error is reduced when codebook size grows

Motivation local feature extraction visual codebook construction vector quantization spatial pooling linear/nolinear classifier Good codebook for classification Small individual quantization error -> discriminative Compact in size – Contradict in some extent Overemphasizing on discriminative ability may increase the size of dictionary and weaken its generalization ability Over-compressing to a dictionary will more or less lose the information and its discriminative power – Find a balance! [X. Lian, Z. Li, C. Wang, B. lu, and L. Zhang, CVPR 2010]

Related Work No quantization – NBNN [6] Supervised codebook – Probabilistic models [5] Unsupervised codebook – Kernel codebook [2] – Sparse coding [3] – Locality-constrained linear coding [4] local feature extraction visual codebook construction vector quantization spatial pooling linear/nolinear classifier

Multi-layer Orthogonal Codebook (MOC) Use standard K-Means to keep efficiency or any other clustering algorithm can be adopted Build codebook from residues to reduce quantization errors explicitly

MOC Creation First layer codebook – K-Means Residue: N is the number of descriptors randomly sampled to build the codebook, d i is one of the descriptors.

MOC Creation Orthogonal residue: Second layer codebook – K-Means Third layer …

Vector Quantization How to use MOC? – Kernel fusion: use them separately Compute the kernels based on each layer codebook separately Let the final kernel to be the combination of multiple kernels – Soft weighting: adjust weight for words from different layers individually for each descriptor Select the nearest word on each layer codebook for a descriptor Use the selected words from all layers to reconstruct that descriptor and minimize reconstruction error

Hard Quantization and Kernel Fusion (HQKF) Hard quantization on each layer – average pooling: descriptors in the m-th sub-region, totally M sub-regions on an image, histogram for m-th sub- region is Histogram intersection kernel …… Linear combine kernel values from each codebook

Soft Weighting (SW) Weighting words for each descriptor Max pooling Linear kernel K is codebook size

Soft Weighting (SW-NN) To further consider the relationships between words from multi-layers Select 2 or more nearest words on each layer codebook, and then weighting them to reconstruct the descriptor Each descriptor is more accurately represented by multiple words on each layer The correlation between similar descriptors by sharing words is captured d1d1 d2d2 [J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, Y. Gong, CVPR 2010]

Experiment Single feature type: SIFT – 16*16 pixel patches densely sampled over a grid with spacing of 6 pixels Spatial pyramid layer: – 21= sub-regions at three resolution level Clustering method on each layer: K-Means

Datasets Caltech-101 – 101 categories, images per category 15 Scenes – 15 scenes, 4485 images

Quantization Error Quantization error is reduced more effectively by MOC compared with simply enlarging codebook size Experiment is done on Caltech101 codebook sizepercent of descriptors codebook 128 vs. codebook % codebook 128 vs. codebook % codebook 128 vs. codebook % codebook 256 vs. codebook % codebook 512 vs. codebook % The right column is the percentage of descriptors whose quantization error is reduced when codebook changes

Codebook Size Classification accuracy comparisons with single layer codebook Comparison with single codebook (Caltech101). 2-layer codebook has the same size on each layer which is also the same size as the single layer codebook.

Comparisons with existing methods Caltech10115 Scenes # of training SPM [1]56.40 (200)64.60 (200)  0.5 (1024) KC [2]-64.14±  0.39 ScSPM [3] 67.0  0.45 (1024)73.2  0.54 (1024)  0.93 (1024) LLC [4]65.43 (2048)*73.44 (2048) - HQKF  0.7 (3-layer 512)  0.8 (3-layer 512)  0.6 (3-layer 1024) SW  0.5 (3-layer 512)  1.1 (3-layer 512)  0.6 (3-layer 1024) SW+2NN  0.5 (2-layer 1024)  0.8 (2-layer 1024) - Classification accuracy comparisons with existing methods Listed methods all used single type descriptor *only LLC used HoG instead of SIFT, we repeated their method with the type of descriptors we use, result is ±1.2

Conclusion Compared with existing methods, the proposed approach has the following merits: – 1) No complex algorithm and easy to implement. – 2) No time-consuming learning or clustering stage. Able to be applied on large scale computer vision systems. – 3) Even more efficient than traditional K-Means clustering. – 4) Explicit residue minimization to explore discriminative power of descriptors. – 5) The basic idea can be combined with many state-of- the-art methods.

References [1] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” CVPR, pp – 2178, [2] J. Gemert, J. Geusebroek, C. Veenman, and A. Smeulders, “Kernel codebooks for scene categorization,” ECCV, pp , [3] J. Yang, K. Yu, Y. Gong, and T. Huang, “Linear spatial pyramid matching using sparse coding for image classification,” CVPR, pp , [4] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, “Locality- constrained linear coding for image classification,” CVPR, pp , [5] X. Lian, Z. Li, C. Wang, B. Lu, and L. Zhang, “Probabilistic models for supervised dictionary learning,” CVPR, pp , [6] O. Boiman, I. Rehovot, E. Shechtman, and M. Irani, “In defense of nearest-neighbor based image classification,” CVPR, pp. 1-8, 2008.

Thank you!

Codebook Size Different size combination on 2-layer MOC Caltech101: The X-axis is the size of the 1 st layer codebook Different colors represent the size of the 2 nd layer codebook