Metric Learning by Collapsing Classes

Metric Learning by Collapsing Classes
Amir Globerson & Sam Roweis Presented by Dumitru Erhan

Motivation Good metric – same thing as good features
Metrics should be problem specific i.e., no “one size fits all” Euclidean metric For instance: Same features: images Two tasks: face recognition and gender identification More insights into the structure of the data, better visualization, etc. A propos: feature extraction ¼ metric learning February-24-19 Best of NIPS 2005

What is a good metric? Elements in same class: close
Elements in different classes: far So, why not make Same class at zero distance and Different classes at infinite distance? Ideal case of spectral clustering: same thing February-24-19 Best of NIPS 2005

Technically speaking…
n examples, (xi,yi), xi 2 Rr, yi 2 {1…k} Ideally, we look for W s.t. after Wx the metric is “good” d(xi,xj|A) = dAij = (xi - xj)TA(xi - xj), A is PSD A = WTW We define pA(j|i) = e- dAij/Zi where Zi = k ie- dAik Ideally, p0(j|i) / 1, if yi = yj and 0 otherwise February-24-19 Best of NIPS 2005

Technically speaking… part II
So we want to minimize (wrt A) the following: f(A) = i KL[p0(j|i) | pA(j|i) ], s.t. A is PSD This is convex! A0, A1: A = A0 + (1 - )A1, s.t. 0 ·  · 1 Objective function: f(A) = -i,j:yj = yilog pA(j|i) = -i,j:yj = yidAij + ilog Zi dAij is linear; ilog Zi is convex February-24-19 Best of NIPS 2005

Duality and Optimization
There is a dual form, but it’s not useful Entropy maximization problem: O(n2) Optimizing the primal: O(r2/2) Some optimization details: Initialize to some random matrix Take small step in the -direction of the gradient Project back into the “PSD cone” (remove negative eigenvalues) February-24-19 Best of NIPS 2005

Dimensionality Reduction and Kernels
Rank of A  dimension of the projection Rank constraints not convex But we could find A as above Keep the components with the q largest eigs Not the same result as rank constraint But they say it’s still good Kernels: objective function freg(A) = i KL[p0(j|i) | pA(j|i) ] + Tr(A) February-24-19 Best of NIPS 2005

Results I Setup Compared with UCI, USPS digits, YALE faces
Learn a metric 1-NN Compared with Fisher’s LDA Xing et al’s method (minimizes mean within class distance, while keeping between class distance larger than one) PCA February-24-19 Best of NIPS 2005

Results II February-24-19 Best of NIPS 2005

Results II Non-Convex Variant Neighbourhood Components Analysis
Optimize W, not A Neighbourhood Components Analysis Minimize the LOO error of k-NN February-24-19 Best of NIPS 2005

Results III February-24-19 Best of NIPS 2005

Discussion Main idea Only suitable for uni-modal class distributions
same class: close, different classes: far Only suitable for uni-modal class distributions Some sort of EM-like algo could help it out? Suitable for Dimensionality reduction (global, one comp/all dimensions) Kernels February-24-19 Best of NIPS 2005

References E. Xing, A. Ng, M. Jordan, and S. Russell. Distance metric learning, with application to clustering with side-information. In Advances in Neural Information Processing Systems (NIPS), 2004. J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov. Neighbourhood components analysis. In Advances in Neural Information Processing Systems (NIPS), 2004. February-24-19 Best of NIPS 2005

Metric Learning by Collapsing Classes

Similar presentations

Presentation on theme: "Metric Learning by Collapsing Classes"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Metric Learning by Collapsing Classes

Similar presentations

Presentation on theme: "Metric Learning by Collapsing Classes"— Presentation transcript:

Similar presentations

About project

Feedback