Download presentation
Presentation is loading. Please wait.
Published byGodfrey Austin Modified over 8 years ago
1
CS598:V ISUAL INFORMATION R ETRIEVAL Lecture IV: Image Representation: Feature Coding and Pooling
2
R ECAP OF L ECTURE III Blob detection Brief of Gaussian filter Scale selection Lapacian of Gaussian (LoG) detector Difference of Gaussian (DoG) detector Affine co-variant region Learning local image descriptors (optional reading)
3
O UTLINE Histogram of local features Bag of words model Soft quantization and sparse coding Supervector with Gaussian mixture model
4
L ECTURE IV: P ART I
5
B AG - OF - FEATURES MODELS
6
O RIGIN 1: T EXTURE RECOGNITION Texture is characterized by the repetition of basic elements or textons For stochastic textures, it is the identity of the textons, not their spatial arrangement, that matters Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
7
O RIGIN 1: T EXTURE RECOGNITION Universal texton dictionary histogram Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
8
O RIGIN 2: B AG - OF - WORDS MODELS Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)
9
O RIGIN 2: B AG - OF - WORDS MODELS US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/ Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)
10
O RIGIN 2: B AG - OF - WORDS MODELS US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/ Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)
11
O RIGIN 2: B AG - OF - WORDS MODELS US Presidential Speeches Tag Cloud http://chir.ag/phernalia/preztags/ Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)
12
1. Extract features 2. Learn “visual vocabulary” 3. Quantize features using visual vocabulary 4. Represent images by frequencies of “visual words” B AG - OF - FEATURES STEPS
13
1. F EATURE EXTRACTION Regular grid or interest regions
14
Normalize patch Detect patches Compute descriptor Slide credit: Josef Sivic 1. F EATURE EXTRACTION
15
… Slide credit: Josef Sivic
16
2. L EARNING THE VISUAL VOCABULARY … Slide credit: Josef Sivic
17
2. L EARNING THE VISUAL VOCABULARY Clustering … Slide credit: Josef Sivic
18
2. L EARNING THE VISUAL VOCABULARY Clustering … Slide credit: Josef Sivic Visual vocabulary
19
K- MEANS CLUSTERING Want to minimize sum of squared Euclidean distances between points x i and their nearest cluster centers m k Algorithm: Randomly initialize K cluster centers Iterate until convergence: Assign each data point to the nearest center Recompute each cluster center as the mean of all points assigned to it
20
C LUSTERING AND VECTOR QUANTIZATION Clustering is a common method for learning a visual vocabulary or codebook Unsupervised learning process Each cluster center produced by k-means becomes a codevector Codebook can be learned on separate training set Provided the training set is sufficiently representative, the codebook will be “universal” The codebook is used for quantizing features A vector quantizer takes a feature vector and maps it to the index of the nearest codevector in a codebook Codebook = visual vocabulary Codevector = visual word
21
E XAMPLE CODEBOOK … Source: B. Leibe Appearance codebook
22
A NOTHER CODEBOOK Appearance codebook … … … … … Source: B. Leibe
23
Yet another codebook Fei-Fei et al. 2005
24
V ISUAL VOCABULARIES : I SSUES How to choose vocabulary size? Too small: visual words not representative of all patches Too large: quantization artifacts, overfitting Computational efficiency Vocabulary trees (Nister & Stewenius, 2006)
25
S PATIAL PYRAMID REPRESENTATION Extension of a bag of features Locally orderless representation at several levels of resolution level 0 Lazebnik, Schmid & Ponce (CVPR 2006)
26
Extension of a bag of features Locally orderless representation at several levels of resolution S PATIAL PYRAMID REPRESENTATION level 0 level 1 Lazebnik, Schmid & Ponce (CVPR 2006)
27
Extension of a bag of features Locally orderless representation at several levels of resolution S PATIAL PYRAMID REPRESENTATION level 0 level 1 level 2 Lazebnik, Schmid & Ponce (CVPR 2006)
28
S CENE CATEGORY DATASET Multi-class classification results (100 training images per class)
29
C ALTECH 101 DATASET http://www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html Multi-class classification results (30 training images per class)
30
B AGS OF FEATURES FOR ACTION RECOGNITION Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008. Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words Space-time interest points
31
B AGS OF FEATURES FOR ACTION RECOGNITION Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008. Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words
32
I MAGE CLASSIFICATION Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
33
L ECTURE IV: P ART II
34
O UTLINE Histogram of local features Bag of words model Soft quantization and sparse coding Supervector with Gaussian mixture model
35
H ARD QUANTIZATION Slides credit: Cao & Feris
36
S OFT QUANTIZATION BASED ON UNCERTAINTY Quantize each local feature into multiple code-words that are closest to it by appropriately splitting its weight Gemert et at., Visual Word Ambiguity, PAMI 2009
37
S OFT Q UANTIZATION Hard quantization Soft quantization
38
S OME EXPERIMENTS ON SOFT QUANTIZATION Improvement on classification rate from soft quantization
39
S PARSE CODING Hard quantization is an “extremely sparse representation Generalizing from it, we may consider to solve for soft quantization, but it is hard to solve In practice, we consider solving the sparse coding as for quantization
40
S OFT QUANTIZATION AND POOLING [Yang et al, 2009]: Linear Spatial Pyramid Matching using Sparse Coding for Image Classification
41
O UTLINE Histogram of local features Bag of words model Soft quantization and sparse coding Supervector with Gaussian mixture model
42
Q UIZ What is the potential shortcoming of bag-of-feature representation based on quantization and pooling?
43
M ODELING THE FEATURE DISTRIBUTION The bag-of-feature histogram represents the distribution of a set of features The quantization steps introduces information loss! How about modeling the distribution without quantization? Gaussian mixture model
44
Q UIZ
45
Multivariate Gaussian Define precision to be the inverse of the covariance In 1-dimension Slides credit: C. M. Bishop T HE G AUSSIAN D ISTRIBUTION mean covariance
46
L IKELIHOOD F UNCTION Data set Assume observed data points generated independently Viewed as a function of the parameters, this is known as the likelihood function Slides credit: C. M. Bishop
47
M AXIMUM L IKELIHOOD Set the parameters by maximizing the likelihood function Equivalently maximize the log likelihood Slides credit: C. M. Bishop
48
M AXIMUM L IKELIHOOD S OLUTION Maximizing w.r.t. the mean gives the sample mean Maximizing w.r.t covariance gives the sample covariance Slides credit: C. M. Bishop
49
B IAS OF M AXIMUM L IKELIHOOD Consider the expectations of the maximum likelihood estimates under the Gaussian distribution The maximum likelihood solution systematically under- estimates the covariance This is an example of over-fitting Slides credit: C. M. Bishop
50
I NTUITIVE E XPLANATION OF O VER - FITTING Slides credit: C. M. Bishop
51
U NBIASED V ARIANCE E STIMATE Clearly we can remove the bias by using since this gives For an infinite data set the two expressions are equal Slides credit: C. M. Bishop
52
BCS Summer School, Exeter, 2003Christopher M. Bishop G AUSSIAN M IXTURES Linear super-position of Gaussians Normalization and positivity require Can interpret the mixing coefficients as prior probabilities
53
E XAMPLE : M IXTURE OF 3 G AUSSIANS Slides credit: C. M. Bishop
54
C ONTOURS OF P ROBABILITY D ISTRIBUTION Slides credit: C. M. Bishop
55
S URFACE P LOT Slides credit: C. M. Bishop
56
S AMPLING FROM THE G AUSSIAN To generate a data point: first pick one of the components with probability then draw a sample from that component Repeat these two steps for each new data point Slides credit: C. M. Bishop
57
S YNTHETIC D ATA S ET Slides credit: C. M. Bishop
58
F ITTING THE G AUSSIAN M IXTURE We wish to invert this process – given the data set, find the corresponding parameters: mixing coefficients, means, and covariances If we knew which component generated each data point, the maximum likelihood solution would involve fitting each component to the corresponding cluster Problem: the data set is unlabelled We shall refer to the labels as latent (= hidden) variables Slides credit: C. M. Bishop
59
S YNTHETIC D ATA S ET W ITHOUT L ABELS Slides credit: C. M. Bishop
60
P OSTERIOR P ROBABILITIES We can think of the mixing coefficients as prior probabilities for the components For a given value of we can evaluate the corresponding posterior probabilities, called responsibilities These are given from Bayes’ theorem by Slides credit: C. M. Bishop
61
P OSTERIOR P ROBABILITIES ( COLOUR CODED ) Slides credit: C. M. Bishop
62
P OSTERIOR P ROBABILITY M AP Slides credit: C. M. Bishop
63
M AXIMUM L IKELIHOOD FOR THE GMM The log likelihood function takes the form Note: sum over components appears inside the log There is no closed form solution for maximum likelihood! Slides credit: C. M. Bishop
64
O VER - FITTING IN G AUSSIAN M IXTURE M ODELS Singularities in likelihood function when a component ‘collapses’ onto a data point: then consider Likelihood function gets larger as we add more components (and hence parameters) to the model not clear how to choose the number K of components Slides credit: C. M. Bishop
65
P ROBLEMS AND S OLUTIONS How to maximize the log likelihood solved by expectation-maximization (EM) algorithm How to avoid singularities in the likelihood function solved by a Bayesian treatment How to choose number K of components also solved by a Bayesian treatment Slides credit: C. M. Bishop
66
EM A LGORITHM – I NFORMAL D ERIVATION Let us proceed by simply differentiating the log likelihood Setting derivative with respect to equal to zero gives giving which is simply the weighted mean of the data Slides credit: C. M. Bishop
67
EM A LGORITHM – I NFORMAL D ERIVATION Similarly for the co-variances For mixing coefficients use a Lagrange multiplier to give Slides credit: C. M. Bishop
68
EM A LGORITHM – I NFORMAL D ERIVATION The solutions are not closed form since they are coupled Suggests an iterative scheme for solving them: Make initial guesses for the parameters Alternate between the following two stages: 1. E-step: evaluate responsibilities 2. M-step: update parameters using ML results Slides credit: C. M. Bishop
69
BCS Summer School, Exeter, 2003 Chri stop her M. Bish op
70
BCS Summer School, Exeter, 2003 Chri stop her M. Bish op
71
BCS Summer School, Exeter, 2003 Chri stop her M. Bish op
72
BCS Summer School, Exeter, 2003 Chri stop her M. Bish op
73
BCS Summer School, Exeter, 2003 Chri stop her M. Bish op
74
BCS Summer School, Exeter, 2003 Chri stop her M. Bish op
75
T HE SUPERVECTOR REPRESENTATION (1) Given a set of features from a set of images, train a Gaussian mixture model. This is called a Universal Background Model. Given the UBM and a set of features from a single image, adapt the UBM to the image feature set by Bayesian EM (check equations in paper below). Zhou et al, “ A novel Gaussianized vector representation for natural scene categorization”, ICPR2008 A novel Gaussianized vector representation for natural scene categorization Origin distributionNew distribution
76
T HE SUPERVECTOR REPRESENTATION (2) Zhou et al, “ A novel Gaussianized vector representation for natural scene categorization”, ICPR2008 A novel Gaussianized vector representation for natural scene categorization
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.