Invariant Large Margin Nearest Neighbour Classifier M. Pawan Kumar Philip Torr Andrew Zisserman.

Slides:



Advertisements
Similar presentations
Semidefinite Programming Machines
Advertisements

POSE–CUT Simultaneous Segmentation and 3D Pose Estimation of Humans using Dynamic Graph Cuts Mathieu Bray Pushmeet Kohli Philip H.S. Torr Department of.
O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Linear Classifiers (perceptrons)
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Support Vector Machines
Face Alignment with Part-Based Modeling
An Analysis of Convex Relaxations (PART I) Minimizing Higher Order Energy Functions (PART 2) Philip Torr Work in collaboration with: Pushmeet Kohli, Srikumar.
Face Recognition & Biometric Systems Support Vector Machines (part 2)
Face Recognition and Biometric Systems
Classification and Decision Boundaries
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
An Analysis of Convex Relaxations M. Pawan Kumar Vladimir Kolmogorov Philip Torr for MAP Estimation.
Learning of Pseudo-Metrics. Slide 1 Online and Batch Learning of Pseudo-Metrics Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Recognising Panoramas
Reduced Support Vector Machine
Efficiently Solving Convex Relaxations M. Pawan Kumar University of Oxford for MAP Estimation Philip Torr Oxford Brookes University.
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Supervised Distance Metric Learning Presented at CMU’s Computer Vision Misc-Read Reading Group May 9, 2007 by Tomasz Malisiewicz.
Linear Solution to Scale and Rotation Invariant Object Matching Professor: 王聖智 教授 Student : 周 節.
Relaxations and Moves for MAP Estimation in MRFs M. Pawan Kumar STANFORDSTANFORD Vladimir KolmogorovPhilip TorrDaphne Koller.
Distance Metric Learning for Large Margin Nearest Neighbor Classification (LMNN) NIPS 2006 Kilian Q. Weinberger, John Blitzer and Lawrence K. Saul.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Classification and Prediction: Regression Analysis
Today Wrap up of probability Vectors, Matrices. Calculus
An Introduction to Support Vector Machines Martin Law.
Collaborative Filtering Matrix Factorization Approach
An Invariant Large Margin Nearest Neighbour Classifier Results Matching Faces from TV Video Aim: To learn a distance metric for invariant nearest neighbour.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Efficient Model Selection for Support Vector Machines
Learning a Kernel Matrix for Nonlinear Dimensionality Reduction By K. Weinberger, F. Sha, and L. Saul Presented by Michael Barnathan.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Low-Rank Kernel Learning with Bregman Matrix Divergences Brian Kulis, Matyas A. Sustik and Inderjit S. Dhillon Journal of Machine Learning Research 10.
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
Efficient Discriminative Learning of Parts-based Models M. Pawan Kumar Andrew Zisserman Philip Torr
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
© Devi Parikh 2008 Devi Parikh and Tsuhan Chen Carnegie Mellon University April 3, ICASSP 2008 Bringing Diverse Classifiers to Common Grounds: dtransform.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
A New Method for Crater Detection Heather Dunlop November 2, 2006.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Linear Solution to Scale and Rotation Invariant Object Matching Hao Jiang and Stella X. Yu Computer Science Department Boston College.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Support vector machines
Data Mining, Neural Network and Genetic Programming
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Metric Learning for Clustering
Recognition using Nearest Neighbor (or kNN)
Softmax Classifier + Generalization
Probabilistic Models for Linear Regression
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Learning with information of features
Collaborative Filtering Matrix Factorization Approach
COSC 4335: Other Classification Techniques
Support vector machines
Concave Minimization for Support Vector Machine Classifiers
Machine learning overview
COSC 4368 Machine Learning Organization
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Invariant Large Margin Nearest Neighbour Classifier M. Pawan Kumar Philip Torr Andrew Zisserman

Aim To learn a distance metric for invariant nearest neighbour classification Training data

Aim To learn a distance metric for invariant nearest neighbour classification Target pairs

Aim To learn a distance metric for invariant nearest neighbour classification Impostor pairs Problem : Euclidean distance may not provide correct nearest neighbours Solution : Learn a mapping to new space

Aim To learn a distance metric for invariant nearest neighbour classification Bring Target pairs closer Move Impostor pairs away

Aim Euclidean DistanceLearnt Distance To learn a distance metric for invariant nearest neighbour classification

Aim Euclidean DistanceLearnt Distance To learn a distance metric for invariant nearest neighbour classification

Aim Euclidean DistanceLearnt Distance To learn a distance metric for invariant nearest neighbour classification

Aim Euclidean DistanceLearnt Distance To learn a distance metric for invariant nearest neighbour classification

Aim Euclidean DistanceLearnt Distance To learn a distance metric for invariant nearest neighbour classification

Aim To learn a distance metric for invariant nearest neighbour classification Transformation Trajectories Learn a mapping to new space

Aim To learn a distance metric for invariant nearest neighbour classification Bring Target Trajectory pairs closer Move Impostor Trajectory pairs away

Aim Euclidean DistanceLearnt Distance To learn a distance metric for invariant nearest neighbour classification

Motivation Face Recognition in TV Video I1I1 I2I2 I3I3 I4I4 InIn Feature Vector Euclidean distance may not give correct nearest neighbours Learn a distance metric

Motivation Face Recognition in TV Video Invariance to changes in position of features

Large Margin Nearest Neighbour (LMNN) Preventing Overfitting Polynomial Transformations Invariant LMNN (ILMNN) Experiments Outline

LMNN Classifier Weinberger, Blitzer and Saul - NIPS 2005 Learns a distance metric for Nearest Neighbour classification Learns a mapping L x Lx Bring target pairs closer Move impostor pairs away xixi xjxj xkxk

LMNN Classifier Weinberger, Blitzer and Saul - NIPS 2005 Learns a distance metric for Nearest Neighbour classification Distance between x i and x j : D(i,j) = (x i -x j ) T L T L (x i -x j ) xixi xjxj xkxk

LMNN Classifier Weinberger, Blitzer and Saul - NIPS 2005 Learns a distance metric for Nearest Neighbour classification Distance between x i and x j : D(i,j) = (x i -x j ) T M (x i -x j ) min Σ ij D(i,j) subject to M 0 Convex Semidefinite Program (SDP) M 0 xixi xjxj xkxk Global minimum

LMNN Classifier Weinberger, Blitzer and Saul - NIPS 2005 Learns a distance metric for Nearest Neighbour classification D(i,k) – D(i,j) ≥ 1- e ijk e ijk ≥ 0 min Σ ijk e ijk subject to M 0 Convex SDP xixi xjxj xkxk

LMNN Classifier Weinberger, Blitzer and Saul - NIPS 2005 Learns a distance metric for Nearest Neighbour classification min Σ ij D(i,j) + Λ H Σ ijk e ijk subject to M 0 D(i,k) – D(i,j) ≥ 1- e ijk e ijk ≥ 0 Solve to obtain optimum M Complexity : Polynomial in number of points

LMNN Classifier Weinberger, Blitzer and Saul - NIPS 2005 Advantages Trivial extension to multiple classes Efficient polynomial time solution Disadvantages Large number of degrees of freedom – overfitting ?? Does not model invariance of data

Large Margin Nearest Neighbour (LMNN) Preventing Overfitting Polynomial Transformations Invariant LMNN (ILMNN) Experiments Outline

L 2 Regularized LMNN Classifier Regularize Frobenius norm of L ||L|| 2 = Σ M ii min Σ ij D(i,j) + Λ H Σ ijk e ijk + Λ R Σ i M ii subject to M 0 D(i,k) – D(i,j) ≥ 1- e ijk e ijk ≥ 0 L 2 -LMNN

Diagonal LMNN Learn a diagonal L matrix => Learn a diagonal M matrix min Σ ij D(i,j) + Λ H Σ ijk e ijk subject to M 0 D(i,k) – D(i,j) ≥ 1- e ijk e ijk ≥ 0 M ij = 0, i ≠ j Linear Program D-LMNN

Diagonally Dominant LMNN Minimize 1-norm of off-diagonal element of M min Σ ij D(i,j) + Λ H Σ ijk e ijk + Λ R Σ ij t ij subject to M 0 D(i,k) – D(i,j) ≥ 1- e ijk e ijk ≥ 0 t ij ≥ M ij, t ij ≥ -M ij, i ≠ j DD-LMNN

LMNN Classifier What about invariance to known transformations? Append input data with transformed versions InefficientInaccurate Can we add invariance to LMNN? No – Not for a general transformation Yes - For some types of transformations

Large Margin Nearest Neighbour (LMNN) Preventing Overfitting Polynomial Transformations Invariant LMNN (ILMNN) Experiments Outline

Polynomial Transformations x = a b Rotate x by an angle θ a b cos θ sin θ -sin θ cos θ 1-θ 2 /2 -(θ-θ 3 /6)a b (θ-θ 3 /6) 1-θ 2 /2 Taylor’s Series

Polynomial Transformations x = a b Rotate x by an angle θ a b cos θ sin θ -sin θ cos θ a 1 θ b-a/2b/6 ba-b/2-a/6 θ2θ2 θ3θ3 Xθ T(θ,x) = X θ

Why are Polynomials Special? ≡ P 0 θ1θ1 θ2θ2 (θ 1,θ 2 ) DISTANCEDISTANCE Sum of squares of polynomials SD-Representability of PolynomialsLasserre, 2001

Why are Polynomials Special? ≡ P’ 0 θ1θ1 θ2θ2 DISTANCEDISTANCE Sum of squares of polynomials

Large Margin Nearest Neighbour (LMNN) Preventing Overfitting Polynomial Transformations Invariant LMNN (ILMNN) Experiments Outline

ILMNN Classifier Learns a distance metric for invariant Nearest Neighbour classification Learns a mapping L x Lx Bring target trajectories closer Move impostor trajectories away Polynomial trajectories xixi xjxj xkxk

ILMNN Classifier Learns a distance metric for invariant Nearest Neighbour classification Learns a mapping L x Lx Bring target trajectories closer Move impostor trajectories away Polynomial trajectories M 0 Minimize maximum distance Maximize minimum distance xixi xjxj xkxk

ILMNN Classifier Learns a distance metric for invariant Nearest Neighbour classification Use SD-Representability. One Semidefinite Constraint. Polynomial trajectories Solve for M in polynomial time. Add regularizers to prevent overfitting. xixi xjxj xkxk

Large Margin Nearest Neighbour (LMNN) Preventing Overfitting Polynomial Transformations Invariant LMNN (ILMNN) Experiments Outline

Dataset Faces from an episode of “Buffy – The Vampire Slayer” 11 Characters * Thanks to Josef Sivic and Mark Everingham 24,244 Faces (with ground truth labelling*)

Dataset Splits Experiment 1 Experiment 2 Random permutation of dataset 30% training 30% validation (to estimate Λ H and Λ R ) 40% testing First 30% training Next 30% validation Last 40% testing Suitable for Nearest Neighbour-type Classification Not so suitable for Nearest Neighbour-type Classification

Incorporating Invariance Invariance of feature position to Euclidean Transformation -5 o ≤ θ ≤ 5 o -3 ≤ t x ≤ 3 pixels -3 ≤ t y ≤ 3 pixels Approximated to degree 2 polynomial using Taylor’s series Derivatives approximated as image differences Image Rotated Image

Incorporating Invariance Invariance of feature position to Euclidean Transformation -5 o ≤ θ ≤ 5 o -3 ≤ t x ≤ 3 pixels -3 ≤ t y ≤ 3 pixels Approximated to degree 2 polynomial using Taylor’s series Derivatives approximated as image differences Smooth Image - = Derivative

Training the Classifiers Within-shot Faces Problem : Euclidean distance provides 0 error Solution : Cluster.

Training the Classifiers Efficiently solve SDP using Alternative Projection Bauschke and Borwein, 1996 Problem : Euclidean distance provides 0 error Solution : Cluster. Train using cluster centres.

Testing the Classifiers Map all training points using L Map the test point using L Find nearest neighbours. Classify. Measure Accuracy = No. of True Positives No. of Test Faces

Timings MethodTrainingTesting kNN-E-62.2 s L 2 -LMNN4 h62.2 s D-LMNN1 h53.2 s DD-LMNN2 h50.5 s L 2 -ILMNN24 h62.2 s D-ILMNN8 h48.2 s DD-ILMNN24 h51.9 s M-SVM300 s446.6 s SVM-KNN s

Accuracy MethodExperiment 1Experiment 2 kNN-E L 2 -LMNN D-LMNN DD-LMNN L 2 -ILMNN D-ILMNN DD-ILMNN M-SVM SVM-KNN

Accuracy MethodExperiment 1Experiment 2 kNN-E L 2 -LMNN D-LMNN DD-LMNN L 2 -ILMNN D-ILMNN DD-ILMNN M-SVM SVM-KNN

Accuracy MethodExperiment 1Experiment 2 kNN-E L 2 -LMNN D-LMNN DD-LMNN L 2 -ILMNN D-ILMNN DD-ILMNN M-SVM SVM-KNN

Accuracy MethodExperiment 1Experiment 2 kNN-E L 2 -LMNN D-LMNN DD-LMNN L 2 -ILMNN D-ILMNN DD-ILMNN M-SVM SVM-KNN

True Positives

Conclusions Regularizers for LMNN Adding invariance to LMNN More accurate than Nearest Neighbour More accurate than LMNN

Future Research D-LMNN and D-ILMNN for Chi-squared distance D-LMNN and D-ILMNN for dot product distance Handling missing data – Sivaswamy, Bhattacharya, Smola, JMLR – 2006 Learning local mappings (adaptive kNN)

Questions ??

False Positives

Precision-Recall Curves Experiment 1

Precision-Recall Curves Experiment 1

Precision-Recall Curves Experiment 1

Precision-Recall Curves Experiment 2

Precision-Recall Curves Experiment 2

Precision-Recall Curves Experiment 2