Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification Andrea Frome, Yoram Singer, Fei Sha, Jitendra.

Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

Nearest neighbor classification D (, )

Learning a Distance Metric from Relative Comparisons [Schulz & Joachims, NIPS ’03] D (, ) D (, ) = ( - ) T ( - )

Approach image j image i

Approach image j image i d ji,m

Approach image j image i D ji =Σ w j,m d ji,m image k

Approach image j image i D ki image k D ji <

Core image j w j,m ? image j image i D ki image k D ji <

Derivations Notation Large-margin formulation Dual problem Solution

Notations D ji =Σ w j,m d ji,m D ji =w j ·d ji D ki > D ji w k ·d ki > w j ·d ji for triplet i, j, k w k ·d ki - w j ·d ji ≥ 1 W w 1 w 2 …w k …w j … X ijk 0 0 … d ki …-d ji … w k ·d ki - w j ·d ji ≥ 1W·X ijk ≥ 1

Large-margin formulation

Soft-margin SVM

Derivation

Details – Features and descriptors Find ~400 features per image Compute geometric blur descriptor

Descriptors Geometric blur

Descriptors Two sizes of geometric blur (42 pixels and 70 pixels) – Each is 204 dimensions (4 orientations and 51 samples each) HSV histograms of 42-pixel patches

Choosing triplets Caltech101 – at 15 images per class – 31.8 million triplets – Many are easy to satisfy For each image j, for each feature – Find the N images I with closest features – For each negative example i in I, form triplets (j, k, i) Eliminates ~ half of triplets

Choosing C

Train with multiple values of C, testing on a held- out part of the training set Choose whichever gives the best results For each C, run online version of the training algorithm – Make one sweep through training triplets – For each misclassified triplet (i,j,k), update weights for the three images – Choose C which gets the most right answers

Results At 15 training examples per class: 63.2% (~3% improvement) At 20 training examples per class: 66.6% (~5% improvement)

Results Confusion matrix Hardest categories: crocodile, cougar_body, cannon, bass

Questions Is there any disadvantage to a non-metric distance function? Could the images be embedded in a metric space? Why not learn everything? – Include a feature for each image pixel – Include multiple types of descriptors Could this be used for to do unsupervised learning for sets of tagged images (e.g., for image segmentation)? Can you learn a single distance per class?

Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification Andrea Frome, Yoram Singer, Fei Sha, Jitendra.

Similar presentations

Presentation on theme: "Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification Andrea Frome, Yoram Singer, Fei Sha, Jitendra."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification Andrea Frome, Yoram Singer, Fei Sha, Jitendra.

Similar presentations

Presentation on theme: "Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification Andrea Frome, Yoram Singer, Fei Sha, Jitendra."— Presentation transcript:

Similar presentations

About project

Feedback