Lecture IX: Object Recognition (2)

Slides:



Advertisements
Similar presentations
ECG Signal processing (2)
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
An Introduction of Support Vector Machine
An Introduction of Support Vector Machine
Support Vector Machines
Quiz 1 on Wednesday ~20 multiple choice or short answer questions
Machine learning continued Image source:
Multi-layer Orthogonal Codebook for Image Classification Presented by Xia Li.
1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.
Classification and Decision Boundaries
Discriminative and generative methods for bags of features
Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
Lecture 28: Bag-of-words models
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Bag-of-features models
Lecture XI: Object Recognition (2)
By Suren Manvelyan,
Discriminative and generative methods for bags of features
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.
Data mining and machine learning A brief introduction.
Classification 2: discriminative models
Support Vector Machine & Image Classification Applications
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 24 – Classifiers 1.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Svetlana Lazebnik, Cordelia Schmid, Jean Ponce
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
SVM-KNN Discriminative Nearest Neighbor Classification for Visual Category Recognition Hao Zhang, Alex Berg, Michael Maire, Jitendra Malik.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Lecture 31: Modern recognition CS4670 / 5670: Computer Vision Noah Snavely.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Methods for classification and image representation
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
CS654: Digital Image Analysis
CS 2750: Machine Learning Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh February 17, 2016.
CS598:V ISUAL INFORMATION R ETRIEVAL Lecture IV: Image Representation: Feature Coding and Pooling.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Another Example: Circle Detection
Neural networks and support vector machines
Photo: CMU Machine Learning Department Protests G20
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Learning Mid-Level Features For Recognition
Support Vector Machines
Basic machine learning background with Python scikit-learn
Paper Presentation: Shape and Matching
By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,
Object detection as supervised classification
An Introduction to Support Vector Machines
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
CS 1674: Intro to Computer Vision Scene Recognition
CS 2770: Computer Vision Intro to Visual Recognition
CS 2750: Machine Learning Support Vector Machines
Hyperparameters, bias-variance tradeoff, validation
Categorization by Learning and Combing Object Parts
COSC 4335: Other Classification Techniques
COSC 4368 Machine Learning Organization
EM Algorithm and its Applications
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Lecture IX: Object Recognition (2) CS558 Computer Vision Lecture IX: Object Recognition (2) Slides adapted from S. Lazebnik

Outline Object recognition: overview and history Object recognition: a learning based approach Object recognition: bag of feature representation Object recognition: classification algorithms

Bag-of-features models

Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures, it is the identity of the textons, not their spatial arrangement, that matters Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

Origin 1: Texture recognition histogram Universal texton dictionary Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

Origin 2: Bag-of-words models Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

Origin 2: Bag-of-words models Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/

Origin 2: Bag-of-words models Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/

Origin 2: Bag-of-words models Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983) US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/

Bag-of-features steps Extract features Learn “visual vocabulary” Quantize features using visual vocabulary Represent images by frequencies of “visual words”

1. Feature extraction Regular grid or interest regions

1. Feature extraction Detect patches Compute descriptor Normalize patch Detect patches Slide credit: Josef Sivic

1. Feature extraction … Slide credit: Josef Sivic

2. Learning the visual vocabulary … Slide credit: Josef Sivic

2. Learning the visual vocabulary … Clustering Slide credit: Josef Sivic

2. Learning the visual vocabulary … Clustering Slide credit: Josef Sivic

K-means clustering Want to minimize sum of squared Euclidean distances between points xi and their nearest cluster centers mk Algorithm: Randomly initialize K cluster centers Iterate until convergence: Assign each data point to the nearest center Recompute each cluster center as the mean of all points assigned to it

Clustering and vector quantization Clustering is a common method for learning a visual vocabulary or codebook Unsupervised learning process Each cluster center produced by k-means becomes a codevector Codebook can be learned on separate training set Provided the training set is sufficiently representative, the codebook will be “universal” The codebook is used for quantizing features A vector quantizer takes a feature vector and maps it to the index of the nearest codevector in a codebook Codebook = visual vocabulary Codevector = visual word

Example codebook … Appearance codebook Source: B. Leibe

Another codebook Appearance codebook … Source: B. Leibe

Yet another codebook Fei-Fei et al. 2005

Visual vocabularies: Issues How to choose vocabulary size? Too small: visual words not representative of all patches Too large: quantization artifacts, overfitting Computational efficiency Vocabulary trees (Nister & Stewenius, 2006)

Spatial pyramid representation Extension of a bag of features Locally orderless representation at several levels of resolution level 0 Lazebnik, Schmid & Ponce (CVPR 2006)

Spatial pyramid representation Extension of a bag of features Locally orderless representation at several levels of resolution level 1 level 0 Lazebnik, Schmid & Ponce (CVPR 2006)

Spatial pyramid representation Extension of a bag of features Locally orderless representation at several levels of resolution level 2 level 0 level 1 Lazebnik, Schmid & Ponce (CVPR 2006)

Scene category dataset Multi-class classification results (100 training images per class)

Caltech101 dataset http://www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html Multi-class classification results (30 training images per class)

Bags of features for action recognition Space-time interest points Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008.

Bags of features for action recognition Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Outline Object recognition: overview and history Object recognition: a learning based approach Object recognition: bag of feature representation Object recognition: classification algorithms

Classifiers Learn a decision rule assigning bag-of-features representations of images to different classes Decision boundary Zebra Non-zebra

Classification Assign input vector to one of two or more classes Any decision rule divides input space into decision regions separated by decision boundaries

Nearest Neighbor Classifier Assign label of nearest training data point to each test data point from Duda et al. Voronoi partitioning of feature space for two-category 2D and 3D data Source: D. Lowe

K-Nearest Neighbors For a new point, find the k closest points from training data Labels of the k points “vote” to classify Works well provided there is lots of data and the distance function is good k = 5 Source: D. Lowe

Functions for comparing histograms L1 distance: χ2 distance: Quadratic distance (cross-bin distance): Histogram intersection (similarity function):

Which hyperplane is best? Linear classifiers Find linear function (hyperplane) to separate positive and negative examples Which hyperplane is best?

Support vector machines Find hyperplane that maximizes the margin between the positive and negative examples C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Support vector machines Find hyperplane that maximizes the margin between the positive and negative examples For support vectors, Distance between point and hyperplane: Therefore, the margin is 2 / ||w|| Support vectors Margin C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Finding the maximum margin hyperplane Maximize margin 2/||w|| Correctly classify all training data: Quadratic optimization problem: Minimize Subject to yi(w·xi+b) ≥ 1 C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Finding the maximum margin hyperplane Solution: learned weight Support vector C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Finding the maximum margin hyperplane Solution: b = yi – w·xi for any support vector Classification function (decision boundary): Notice that it relies on an inner product between the test point x and the support vectors xi Solving the optimization problem also involves computing the inner products xi · xj between all pairs of training points C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Nonlinear SVMs Datasets that are linearly separable work out great: But what if the dataset is just too hard? We can map it to a higher-dimensional space: x x x2 x Slide credit: Andrew Moore

Nonlinear SVMs General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ: x → φ(x) Slide credit: Andrew Moore

Nonlinear SVMs The kernel trick: instead of explicitly computing the lifting transformation φ(x), define a kernel function K such that K(xi , xj) = φ(xi ) · φ(xj) (to be valid, the kernel function must satisfy Mercer’s condition) This gives a nonlinear decision boundary in the original feature space: C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Nonlinear kernel: Example Consider the mapping x2

Kernels for bags of features Histogram intersection kernel: Generalized Gaussian kernel: D can be L1 distance, Euclidean distance, χ2 distance, etc. J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, Local Features and Kernels for Classifcation of Texture and Object Categories: A Comprehensive Study, IJCV 2007

Summary: SVMs for image classification Pick an image representation (in our case, bag of features) Pick a kernel function for that representation Compute the matrix of kernel values between every pair of training examples Feed the kernel matrix into your favorite SVM solver to obtain support vectors and weights At test time: compute kernel values for your test example and each support vector, and combine them with the learned weights to get the value of the decision function

What about multi-class SVMs? Unfortunately, there is no “definitive” multi-class SVM formulation In practice, we have to obtain a multi-class SVM by combining multiple two-class SVMs One vs. others Traning: learn an SVM for each class vs. the others Testing: apply each SVM to test example and assign to it the class of the SVM that returns the highest decision value One vs. one Training: learn an SVM for each pair of classes Testing: each learned SVM “votes” for a class to assign to the test example

SVMs: Pros and cons Pros Cons Many publicly available SVM packages: http://www.kernel-machines.org/software Kernel-based framework is very powerful, flexible SVMs work very well in practice, even with very small training sample sizes Cons No “direct” multi-class SVM, must combine two-class SVMs Computation, memory During training time, must compute matrix of kernel values for every pair of examples Learning can take a very long time for large-scale problems

Summary: Classifiers Nearest-neighbor and k-nearest-neighbor classifiers L1 distance, χ2 distance, quadratic distance, histogram intersection Support vector machines Linear classifiers Margin maximization The kernel trick Kernel functions: histogram intersection, generalized Gaussian, pyramid match Multi-class Of course, there are many other classifiers out there Neural networks, boosting, decision trees, …

Visual Recognition Challenge: Final Project Visual Recognition Challenge:

Data set: Graz02 Bike, Cars, Person, None (background)

Train/Validation/Test Use the Train dataset to train the classifiers for recognizing the three object categories; Use the validation dataset to tune any parameters; Fix the parameters you tuned and report your final classification accuracy on the test images.

Feature Extraction Sample fixed-size image patches on regular grid of the image. You can use a single fixed image patch size, or have several different sizes. Sample random-size patches at random locations Sample fixed-size patches around the feature locations from, e.g., your own Harris corner detector or any other feature detector you found online. Note if you used code you downloaded online, please cite it clearly in your report.

Feature description It is recommended that you implement the SIFT descriptor described in the lecture slides for each image patch you generated. But you may also use other feature descriptors, such as the raw pixel values (bias-gain normalized). You don’t need to make your SIFT descriptor to be rotation invariant if you don’t want to do so.

Dictionary Computation Run k-means clustering on all training features to learn the dictionary. Setting k to be 500-1000 should be sufficient. You may experiment with different size.

Image Representation Given the features extracted from any image, quantize each feature to its closest code in the dictionary. For each image, you can accumulate a histogram by counting how many features are quantized to each code in the dictionary. This is your image representation.

Classifier training Given the histogram representation of all images, you may use a k-NN classifier or any other classifier you wish to use. If you wanted to push for recognition, you mean consider training an SVM classifier. Please feel free to find any SVM library to train your classifier. You may use the Matlab Toolbox provided in this link (http://theoval.cmp.uea.ac.uk/~gcc/svm/toolbox/).

Recognition Quality After you train the classifier, please report recognition accuracy on the test image set. It is the ratio of images you recognized correctly. Please also generate the so- called confusion matrix to enumerate, e.g, how many images you have mis-classified from one category to another category. You will need to report results from both validation and test dataset to see how your algorithm generalize.

Important Note Please follow the train/validation/test protocol and don’t heavily tune parameters on the test dataset. You will be asked to report recognition accuracy on both validation and test dataset. Tuning parameters on the test dataset is considered to be cheating.

Questions?