Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.

Slides:



Advertisements
Similar presentations
Latent SVMs for Human Detection with a Locally Affine Deformation Field Ľubor Ladický 1 Phil Torr 2 Andrew Zisserman 1 1 University of Oxford 2 Oxford.
Advertisements

Ignas Budvytis*, Tae-Kyun Kim*, Roberto Cipolla * - indicates equal contribution Making a Shallow Network Deep: Growing a Tree from Decision Regions of.
Multiclass SVM and Applications in Object Classification
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Three things everyone should know to improve object retrieval
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
Ľubor Ladický1 Phil Torr2 Andrew Zisserman1
Classification using intersection kernel SVMs is efficient Joint work with Subhransu Maji and Alex Berg Jitendra Malik UC Berkeley.
Support Vector Machines
Quiz 1 on Wednesday ~20 multiple choice or short answer questions
Optimization Tutorial
Large Scale Manifold Transduction Michael Karlen Jason Weston Ayse Erkan Ronan Collobert ICML 2008.
Intro to DPM By Zhangliliang. Outline Intuition Introduction to DPM Model Inference(matching) Training latent SVM Training Procedure Initialization Post-processing.
Addressing the Medical Image Annotation Task using visual words representation Uri Avni, Tel Aviv University, Israel Hayit GreenspanTel Aviv University,
Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.
EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.
Fast intersection kernel SVMs for Realtime Object Detection
Discriminative and generative methods for bags of features
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Sparse vs. Ensemble Approaches to Supervised Learning
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
MSRC Summer School - 30/06/2009 Cambridge – UK Hybrids of generative and discriminative methods for machine learning.
Sparse vs. Ensemble Approaches to Supervised Learning
K-means Based Unsupervised Feature Learning for Image Recognition Ling Zheng.
Object Recognition by Discriminative Methods Sinisa Todorovic 1st Sino-USA Summer School in VLPR July, 2009.
Generic object detection with deformable part-based models
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem
Bag of Video-Words Video Representation
Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.
Classification 2: discriminative models
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
An Example of Course Project Face Identification.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 24 – Classifiers 1.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Handwritten digit recognition Jitendra Malik. Handwritten digit recognition (MNIST,USPS) LeCun’s Convolutional Neural Networks variations (0.8%, 0.6%
Building high-level features using large-scale unsupervised learning Anh Nguyen, Bay-yuan Hsu CS290D – Data Mining (Spring 2014) University of California,
Image Categorization Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/15/11.
Classifiers Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/09/15.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
Handwritten digit recognition
Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.
Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Boris Babenko, Steve Branson, Serge Belongie University of California, San Diego ICCV 2009, Kyoto, Japan.
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Discriminative Sub-categorization Minh Hoai Nguyen, Andrew Zisserman University of Oxford 1.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Hybrid Classifiers for Object Classification with a Rich Background M. Osadchy, D. Keren, and B. Fadida-Specktor, ECCV 2012 Computer Vision and Video Analysis.
Support Vector Machines Optimization objective Machine Learning.
Learning by Loss Minimization. Machine learning: Learn a Function from Examples Function: Examples: – Supervised: – Unsupervised: – Semisuprvised:
Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Learning Mid-Level Features For Recognition
Basic machine learning background with Python scikit-learn
Recognition using Nearest Neighbor (or kNN)
Digit Recognition using SVMS
CAMCOS Report Day December 9th, 2015 San Jose State University
Pawan Lingras and Cory Butz
SVM-based Deep Stacking Networks
Concave Minimization for Support Vector Machine Classifiers
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
CAMCOS Report Day December 9th, 2015 San Jose State University
Presentation transcript:

Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr

Given : { [x 1, y 1 ], [x 2, y 2 ], … [x n, y n ] } Binary classification task x i - feature vector y i  {-1, 1} - label The task is : To design y = H(x) that predicts a label y for a new x

Given : { [x 1, y 1 ], [x 2, y 2 ], … [x n, y n ] } x i - feature vector y i  {-1, 1} - label The task is : To design y = H(x) that predicts a label y for a new x Several approaches proposed : Support Vector Machines, Boosting, Random Forests, Neural networks, Nearest Neighbour,.. Binary classification task

Linear SVMs Fast training and evaluation Applicable to large scale data sets Low discriminative power Kernel SVMs Slow training and evaluation Not feasible to too large data sets Much better performance for hard problems Linear vs. Kernel SVMs

The goal is to design an SVM with: Good trade-off between the performance and speed Scalability to large scale data sets (solvable using SGD) Motivation

Points approximated as a weighted sum of anchor points v – anchor points γ V (x) - coordinates Local codings

Points approximated as a weighted sum of anchor points Coordinates obtained using : Distance based methods (Gemert et al. 2008, Zhou et al. 2009) Reconstruction methods (Roweis et al.2000, Yu et al. 2009, Gao et al. 2010) v – anchor points γ V (x) - coordinates Local codings

For a normalised codings and any Lipschitz function f : (Yu et al. 2009)  v – anchor points γ V (x) - coordinates Local codings

The form of the classifier Weights w and bias b obtained as : (Vapnik & Learner 1963, Cortes & Vapnik 1995) [x k, y k ] – training samples λ- regularisation weight S – number of samples Linear SVMs

The decision boundary should be smooth Approximately linear is sufficiently small region Locally Linear SVMs

The decision boundary should be smooth Approximately linear is sufficiently small region Locally Linear SVMs

The decision boundary should be smooth Approximately linear is sufficiently small region Locally Linear SVMs

The decision boundary should be smooth Approximately linear is sufficiently small region Curvature is bounded Functions w i (x) and b(x) are Lipschitz w i (x) and b(x) can be approximated using local coding Locally Linear SVMs

The decision boundary should be smooth Approximately linear is sufficiently small region Curvature is bounded Functions w i (x) and b(x) are Lipschitz w i (x) and b(x) can be approximated using local coding

The classifier takes the form : Weights W and biases b are obtained as : where Locally Linear SVMs

Optimised using stochastic gradient descent (Bordes et al. 2005) : Locally Linear SVMs

Generalisation of Linear SVM on x : - representable by W = (w w..) T and b = (b’ b’..) T  Generalisation of Linear SVM on γ: - representable by W = 0 and b = w Generalisation of model selecting Latent(MI)-SVM : - representable by γ = (0, 0,.., 1,.. 0) Relation to other models

The finite kernel classifier takes the form where is the kernel function Weights W and b obtained as : Extension to finite kernels

MNIST, LETTER & USPS datasets Anchor points obtained using K-means clustering Coordinates evaluated on kNN (k = 8) (slow part) Coordinates obtained using weighted inverse distance Raw data used CALTECH-101 (15 training samples per class) Coordinates evaluated on kNN (k = 5) Approximated Intersection kernel used (Vedaldi & Zisserman 2010) Spatial pyramid of BOW features Coordinates evaluated based on histogram over whole image Experiments

MNIST Experiments

UIUC / LETTER Caltech-101 Experiments

We propose novel Locally Linear SVM formulation with  Good trade-off between speed and performance  Scalability to large scale data sets  Easy to implement Optimal way of learning anchor points is an open question Questions ? Conclusions