Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification Andrea Frome, Yoram Singer, Fei Sha, Jitendra.

Similar presentations


Presentation on theme: "Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification Andrea Frome, Yoram Singer, Fei Sha, Jitendra."— Presentation transcript:

1 Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification Andrea Frome, Yoram Singer, Fei Sha, Jitendra Malik

2 Goal

3 Nearest neighbor classification D (, )

4 Nearest neighbor classification D (, )

5 Learning a Distance Metric from Relative Comparisons [Schulz & Joachims, NIPS ’03] D (, ) D (, ) = ( - ) T ( - )

6

7 Approach image j image i

8 Approach image j image i d ji,m

9 Approach image j image i D ji =Σ w j,m d ji,m image k

10 Approach image j image i D ki image k D ji <

11 Core image j w j,m ? image j image i D ki image k D ji <

12 Derivations Notation Large-margin formulation Dual problem Solution

13 Notations D ji =Σ w j,m d ji,m D ji =w j ·d ji D ki > D ji w k ·d ki > w j ·d ji for triplet i, j, k w k ·d ki - w j ·d ji ≥ 1 W w 1 w 2 …w k …w j … X ijk 0 0 … d ki …-d ji … w k ·d ki - w j ·d ji ≥ 1W·X ijk ≥ 1

14 Large-margin formulation

15 SVM

16

17

18

19 Soft-margin SVM

20 Derivation

21 Dual

22 Details – Features and descriptors Find ~400 features per image Compute geometric blur descriptor

23 Descriptors Geometric blur

24 Descriptors Two sizes of geometric blur (42 pixels and 70 pixels) – Each is 204 dimensions (4 orientations and 51 samples each) HSV histograms of 42-pixel patches

25 Choosing triplets Caltech101 – at 15 images per class – 31.8 million triplets – Many are easy to satisfy For each image j, for each feature – Find the N images I with closest features – For each negative example i in I, form triplets (j, k, i) Eliminates ~ half of triplets

26 Choosing C

27 Train with multiple values of C, testing on a held- out part of the training set Choose whichever gives the best results For each C, run online version of the training algorithm – Make one sweep through training triplets – For each misclassified triplet (i,j,k), update weights for the three images – Choose C which gets the most right answers

28 Results At 15 training examples per class: 63.2% (~3% improvement) At 20 training examples per class: 66.6% (~5% improvement)

29 Results Confusion matrix Hardest categories: crocodile, cougar_body, cannon, bass

30 Questions Is there any disadvantage to a non-metric distance function? Could the images be embedded in a metric space? Why not learn everything? – Include a feature for each image pixel – Include multiple types of descriptors Could this be used for to do unsupervised learning for sets of tagged images (e.g., for image segmentation)? Can you learn a single distance per class?


Download ppt "Learning Globally-Consistent Local Distance Functions for Shape-Based Image Retrieval and Classification Andrea Frome, Yoram Singer, Fei Sha, Jitendra."

Similar presentations


Ads by Google