CS598:V ISUAL INFORMATION R ETRIEVAL Lecture IV: Image Representation: Feature Coding and Pooling.

Slides:



Advertisements
Similar presentations
Part 2: Unsupervised Learning
Advertisements

Mixture Models and the EM Algorithm
Clustering with k-means and mixture of Gaussian densities Jakob Verbeek December 3, 2010 Course website:
Multi-layer Orthogonal Codebook for Image Classification Presented by Xia Li.
Supervised Learning Recap
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Image alignment Image from
Discriminative and generative methods for bags of features
Visual Recognition Tutorial
Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
Lecture 28: Bag-of-words models
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Bag-of-features models
Clustering.
Generative learning methods for bags of features
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Visual Recognition Tutorial
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Lecture XI: Object Recognition (2)
ECE 5984: Introduction to Machine Learning
By Suren Manvelyan,
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Discriminative and generative methods for bags of features
Gaussian Mixture Models and Expectation Maximization.
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Biointelligence Laboratory, Seoul National University
Exercise Session 10 – Image Categorization
Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
The EM algorithm, and Fisher vector image representation
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
G AUSSIAN M IXTURE M ODELS David Sears Music Information Retrieval October 8, 2009.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber
Lecture 17 Gaussian Mixture Models and Expectation Maximization
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Parameter Learning with Hidden Variables & Expectation Maximization.
Lecture 2: Statistical learning primer for biologists
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Flat clustering approaches
CS654: Digital Image Analysis
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: MLLR For Two Gaussians Mean and Variance Adaptation MATLB Example Resources:
A PPLICATIONS OF TOPIC MODELS Daphna Weinshall B Slides credit: Joseph Sivic, Li Fei-Fei, Brian Russel and others.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Machine Learning and Category Representation Jakob Verbeek November 25, 2011 Course website:
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Lecture IX: Object Recognition (2)
The topic discovery models
Classification of unlabeled data:
By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,
Latent Variables, Mixture Models and EM
The topic discovery models
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Biointelligence Laboratory, Seoul National University
EM Algorithm and its Applications
Presentation transcript:

CS598:V ISUAL INFORMATION R ETRIEVAL Lecture IV: Image Representation: Feature Coding and Pooling

R ECAP OF L ECTURE III Blob detection Brief of Gaussian filter Scale selection Lapacian of Gaussian (LoG) detector Difference of Gaussian (DoG) detector Affine co-variant region Learning local image descriptors (optional reading)

O UTLINE Histogram of local features Bag of words model Soft quantization and sparse coding Supervector with Gaussian mixture model

L ECTURE IV: P ART I

B AG - OF - FEATURES MODELS

O RIGIN 1: T EXTURE RECOGNITION Texture is characterized by the repetition of basic elements or textons For stochastic textures, it is the identity of the textons, not their spatial arrangement, that matters Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

O RIGIN 1: T EXTURE RECOGNITION Universal texton dictionary histogram Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003

O RIGIN 2: B AG - OF - WORDS MODELS Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

O RIGIN 2: B AG - OF - WORDS MODELS US Presidential Speeches Tag Cloud Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

O RIGIN 2: B AG - OF - WORDS MODELS US Presidential Speeches Tag Cloud Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

O RIGIN 2: B AG - OF - WORDS MODELS US Presidential Speeches Tag Cloud Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)

1. Extract features 2. Learn “visual vocabulary” 3. Quantize features using visual vocabulary 4. Represent images by frequencies of “visual words” B AG - OF - FEATURES STEPS

1. F EATURE EXTRACTION Regular grid or interest regions

Normalize patch Detect patches Compute descriptor Slide credit: Josef Sivic 1. F EATURE EXTRACTION

… Slide credit: Josef Sivic

2. L EARNING THE VISUAL VOCABULARY … Slide credit: Josef Sivic

2. L EARNING THE VISUAL VOCABULARY Clustering … Slide credit: Josef Sivic

2. L EARNING THE VISUAL VOCABULARY Clustering … Slide credit: Josef Sivic Visual vocabulary

K- MEANS CLUSTERING Want to minimize sum of squared Euclidean distances between points x i and their nearest cluster centers m k Algorithm: Randomly initialize K cluster centers Iterate until convergence: Assign each data point to the nearest center Recompute each cluster center as the mean of all points assigned to it

C LUSTERING AND VECTOR QUANTIZATION Clustering is a common method for learning a visual vocabulary or codebook Unsupervised learning process Each cluster center produced by k-means becomes a codevector Codebook can be learned on separate training set Provided the training set is sufficiently representative, the codebook will be “universal” The codebook is used for quantizing features A vector quantizer takes a feature vector and maps it to the index of the nearest codevector in a codebook Codebook = visual vocabulary Codevector = visual word

E XAMPLE CODEBOOK … Source: B. Leibe Appearance codebook

A NOTHER CODEBOOK Appearance codebook … … … … … Source: B. Leibe

Yet another codebook Fei-Fei et al. 2005

V ISUAL VOCABULARIES : I SSUES How to choose vocabulary size? Too small: visual words not representative of all patches Too large: quantization artifacts, overfitting Computational efficiency Vocabulary trees (Nister & Stewenius, 2006)

S PATIAL PYRAMID REPRESENTATION Extension of a bag of features Locally orderless representation at several levels of resolution level 0 Lazebnik, Schmid & Ponce (CVPR 2006)

Extension of a bag of features Locally orderless representation at several levels of resolution S PATIAL PYRAMID REPRESENTATION level 0 level 1 Lazebnik, Schmid & Ponce (CVPR 2006)

Extension of a bag of features Locally orderless representation at several levels of resolution S PATIAL PYRAMID REPRESENTATION level 0 level 1 level 2 Lazebnik, Schmid & Ponce (CVPR 2006)

S CENE CATEGORY DATASET Multi-class classification results (100 training images per class)

C ALTECH 101 DATASET Multi-class classification results (30 training images per class)

B AGS OF FEATURES FOR ACTION RECOGNITION Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words Space-time interest points

B AGS OF FEATURES FOR ACTION RECOGNITION Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words

I MAGE CLASSIFICATION Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

L ECTURE IV: P ART II

O UTLINE Histogram of local features Bag of words model Soft quantization and sparse coding Supervector with Gaussian mixture model

H ARD QUANTIZATION Slides credit: Cao & Feris

S OFT QUANTIZATION BASED ON UNCERTAINTY Quantize each local feature into multiple code-words that are closest to it by appropriately splitting its weight Gemert et at., Visual Word Ambiguity, PAMI 2009

S OFT Q UANTIZATION Hard quantization Soft quantization

S OME EXPERIMENTS ON SOFT QUANTIZATION Improvement on classification rate from soft quantization

S PARSE CODING Hard quantization is an “extremely sparse representation Generalizing from it, we may consider to solve for soft quantization, but it is hard to solve In practice, we consider solving the sparse coding as for quantization

S OFT QUANTIZATION AND POOLING [Yang et al, 2009]: Linear Spatial Pyramid Matching using Sparse Coding for Image Classification

O UTLINE Histogram of local features Bag of words model Soft quantization and sparse coding Supervector with Gaussian mixture model

Q UIZ What is the potential shortcoming of bag-of-feature representation based on quantization and pooling?

M ODELING THE FEATURE DISTRIBUTION The bag-of-feature histogram represents the distribution of a set of features The quantization steps introduces information loss! How about modeling the distribution without quantization? Gaussian mixture model

Q UIZ

Multivariate Gaussian Define precision to be the inverse of the covariance In 1-dimension Slides credit: C. M. Bishop T HE G AUSSIAN D ISTRIBUTION mean covariance

L IKELIHOOD F UNCTION Data set Assume observed data points generated independently Viewed as a function of the parameters, this is known as the likelihood function Slides credit: C. M. Bishop

M AXIMUM L IKELIHOOD Set the parameters by maximizing the likelihood function Equivalently maximize the log likelihood Slides credit: C. M. Bishop

M AXIMUM L IKELIHOOD S OLUTION Maximizing w.r.t. the mean gives the sample mean Maximizing w.r.t covariance gives the sample covariance Slides credit: C. M. Bishop

B IAS OF M AXIMUM L IKELIHOOD Consider the expectations of the maximum likelihood estimates under the Gaussian distribution The maximum likelihood solution systematically under- estimates the covariance This is an example of over-fitting Slides credit: C. M. Bishop

I NTUITIVE E XPLANATION OF O VER - FITTING Slides credit: C. M. Bishop

U NBIASED V ARIANCE E STIMATE Clearly we can remove the bias by using since this gives For an infinite data set the two expressions are equal Slides credit: C. M. Bishop

BCS Summer School, Exeter, 2003Christopher M. Bishop G AUSSIAN M IXTURES Linear super-position of Gaussians Normalization and positivity require Can interpret the mixing coefficients as prior probabilities

E XAMPLE : M IXTURE OF 3 G AUSSIANS Slides credit: C. M. Bishop

C ONTOURS OF P ROBABILITY D ISTRIBUTION Slides credit: C. M. Bishop

S URFACE P LOT Slides credit: C. M. Bishop

S AMPLING FROM THE G AUSSIAN To generate a data point: first pick one of the components with probability then draw a sample from that component Repeat these two steps for each new data point Slides credit: C. M. Bishop

S YNTHETIC D ATA S ET Slides credit: C. M. Bishop

F ITTING THE G AUSSIAN M IXTURE We wish to invert this process – given the data set, find the corresponding parameters: mixing coefficients, means, and covariances If we knew which component generated each data point, the maximum likelihood solution would involve fitting each component to the corresponding cluster Problem: the data set is unlabelled We shall refer to the labels as latent (= hidden) variables Slides credit: C. M. Bishop

S YNTHETIC D ATA S ET W ITHOUT L ABELS Slides credit: C. M. Bishop

P OSTERIOR P ROBABILITIES We can think of the mixing coefficients as prior probabilities for the components For a given value of we can evaluate the corresponding posterior probabilities, called responsibilities These are given from Bayes’ theorem by Slides credit: C. M. Bishop

P OSTERIOR P ROBABILITIES ( COLOUR CODED ) Slides credit: C. M. Bishop

P OSTERIOR P ROBABILITY M AP Slides credit: C. M. Bishop

M AXIMUM L IKELIHOOD FOR THE GMM The log likelihood function takes the form Note: sum over components appears inside the log There is no closed form solution for maximum likelihood! Slides credit: C. M. Bishop

O VER - FITTING IN G AUSSIAN M IXTURE M ODELS Singularities in likelihood function when a component ‘collapses’ onto a data point: then consider Likelihood function gets larger as we add more components (and hence parameters) to the model not clear how to choose the number K of components Slides credit: C. M. Bishop

P ROBLEMS AND S OLUTIONS How to maximize the log likelihood solved by expectation-maximization (EM) algorithm How to avoid singularities in the likelihood function solved by a Bayesian treatment How to choose number K of components also solved by a Bayesian treatment Slides credit: C. M. Bishop

EM A LGORITHM – I NFORMAL D ERIVATION Let us proceed by simply differentiating the log likelihood Setting derivative with respect to equal to zero gives giving which is simply the weighted mean of the data Slides credit: C. M. Bishop

EM A LGORITHM – I NFORMAL D ERIVATION Similarly for the co-variances For mixing coefficients use a Lagrange multiplier to give Slides credit: C. M. Bishop

EM A LGORITHM – I NFORMAL D ERIVATION The solutions are not closed form since they are coupled Suggests an iterative scheme for solving them: Make initial guesses for the parameters Alternate between the following two stages: 1. E-step: evaluate responsibilities 2. M-step: update parameters using ML results Slides credit: C. M. Bishop

BCS Summer School, Exeter, 2003 Chri stop her M. Bish op

BCS Summer School, Exeter, 2003 Chri stop her M. Bish op

BCS Summer School, Exeter, 2003 Chri stop her M. Bish op

BCS Summer School, Exeter, 2003 Chri stop her M. Bish op

BCS Summer School, Exeter, 2003 Chri stop her M. Bish op

BCS Summer School, Exeter, 2003 Chri stop her M. Bish op

T HE SUPERVECTOR REPRESENTATION (1) Given a set of features from a set of images, train a Gaussian mixture model. This is called a Universal Background Model. Given the UBM and a set of features from a single image, adapt the UBM to the image feature set by Bayesian EM (check equations in paper below). Zhou et al, “ A novel Gaussianized vector representation for natural scene categorization”, ICPR2008 A novel Gaussianized vector representation for natural scene categorization Origin distributionNew distribution

T HE SUPERVECTOR REPRESENTATION (2) Zhou et al, “ A novel Gaussianized vector representation for natural scene categorization”, ICPR2008 A novel Gaussianized vector representation for natural scene categorization