Download presentation
Presentation is loading. Please wait.
1
Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik
2
Goal
3
Nearest neighbor classification D (, )
4
Nearest neighbor classification D (, )
5
Learning a Distance Metric from Relative Comparisons [Schulz & Joachims, NIPS ’03] D (, ) D (, ) = ( - ) T ( - )
7
Approach image j image i
8
Approach image j image i d ji,m
9
Approach image j image i D ji =Σ w j,m d ji,m image k
10
Approach image j image i D ki image k D ji <
11
Core image j w j,m ? image j image i D ki image k D ji <
12
Derivations Notation Large-margin formulation Dual problem Solution
13
Notations D ji =Σ w j,m d ji,m D ji =w j ·d ji D ki > D ji w k ·d ki > w j ·d ji for triplet i, j, k w k ·d ki - w j ·d ji ≥ 1 W w 1 w 2 …w k …w j … X ijk 0 0 … d ki …-d ji … w k ·d ki - w j ·d ji ≥ 1W·X ijk ≥ 1
14
Large-margin formulation
15
SVM
19
Soft-margin SVM
20
Derivation
21
Dual
22
Details – Features and descriptors Find ~400 features per image Compute geometric blur descriptor
23
Descriptors Geometric blur
24
Descriptors Two sizes of geometric blur (42 pixels and 70 pixels) – Each is 204 dimensions (4 orientations and 51 samples each) HSV histograms of 42-pixel patches
25
Choosing triplets Caltech101 – at 15 images per class – 31.8 million triplets – Many are easy to satisfy For each image j, for each feature – Find the N images I with closest features – For each negative example i in I, form triplets (j, k, i) Eliminates ~ half of triplets
26
Choosing C
27
Train with multiple values of C, testing on a held- out part of the training set Choose whichever gives the best results For each C, run online version of the training algorithm – Make one sweep through training triplets – For each misclassified triplet (i,j,k), update weights for the three images – Choose C which gets the most right answers
28
Results At 15 training examples per class: 63.2% (~3% improvement) At 20 training examples per class: 66.6% (~5% improvement)
29
Results Confusion matrix Hardest categories: crocodile, cougar_body, cannon, bass
30
Questions Is there any disadvantage to a non-metric distance function? Could the images be embedded in a metric space? Why not learn everything? – Include a feature for each image pixel – Include multiple types of descriptors Could this be used for to do unsupervised learning for sets of tagged images (e.g., for image segmentation)? Can you learn a single distance per class?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.