Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

Slides:

Advertisements

Similar presentations

Introduction to Support Vector Machines (SVM)

Advertisements

Lecture 9 Support Vector Machines

ECG Signal processing (2)

DECISION TREES. Decision trees  One possible representation for hypotheses.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

Data Mining Classification: Alternative Techniques

Data Mining Classification: Alternative Techniques

An Introduction of Support Vector Machine

Support Vector Machines

SVM—Support Vector Machines

Support Vector Machines and Margins

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Classification and Decision Boundaries

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Support Vector Machines (SVMs) Chapter 5 (Duda et al.)

The value of kernel function represents the inner product of two training points in feature space Kernel functions merge two steps 1. map input data from.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.

1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.

1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.

Using Analytic QP and Sparseness to Speed Training of Support Vector Machines John C. Platt Presented by: Travis Desell.

Reduced Support Vector Machine

Active Learning with Support Vector Machines

Support Vector Machines Kernel Machines

Support Vector Machines

INSTANCE-BASE LEARNING

Sparse vs. Ensemble Approaches to Supervised Learning

CS Instance Based Learning1 Instance Based Learning.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

An Introduction to Support Vector Machines Martin Law.

Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.

Efficient Model Selection for Support Vector Machines

SVM by Sequential Minimal Optimization (SMO)

Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.

Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.

Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

Window-based models for generic object detection Mei-Chen Yeh 04/24/2012.

Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.

Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

An Introduction to Support Vector Machines (M. Law)

Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:

CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.

Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.

Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.

Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.

Support Vector Machines Tao Department of computer science University of Illinois.

Chapter1: Introduction Chapter2: Overview of Supervised Learning

Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.

Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.

Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Second Annual Review June 1, 2001 Data Mining Institute.

Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.

Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.

1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.

Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:

Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.

SIMILARITY SEARCH The Metric Space Approach

Deep Feedforward Networks

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

K Nearest Neighbor Classification

COSC 4335: Other Classification Techniques

Computer Vision Chapter 4

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Presentation transcript:

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International Conference on Machine Learning (ICML-03), August 2003 Presented by Despina Kontos CIS 525 Neural Computation Spring 2004 Instructor: S.Vucetic

Overview Introduction  Motivation and the main idea. Background and related work  A little bit about Kernel Machines (KMs) and previous work. Methodology  The Nearest Support Vectors (NSVs).  Some enhancements. Experiments and results Discussion

Introduction Why Kernel Machines??  They overcome the “curse of dimensionality”, using kernel functions, while exploring large nonlinear feature spaces. What is the problem??  The tradeoff for this power is that a KM's query-time complexity scales linearly with the number of Support Vectors, making KMs often orders of magnitude more expensive at query-time than other popular machine learning alternatives.  KM costs are identical for each query, even for “easy” ones that alternatives (e.g. decision trees) can classify much faster than harder ones.

Introduction So, what would be an ideal approach?  Use a simple linearclassifier for the (majority of) queries it is likely to correctly classify.  Implement the query-time cost exact KM only for those queries for which such precision likely matters.  For the rest of the cases, use something in between with complexity proportional to the difficulty of the query. A new idea!!  One can often achieve the same classification as the exact KM by using only small fraction of the nearest support vectors (SVs) of a query.  Approximate the exact KM with a k nearest-neighbor (k-NN) KM, whose output sums only over the (weighted) kernel values involving the k nearest (according to some distance) selected SVs.

Background Kernel Machines Summary  Binary SVM classifier is trained by optimizing an n-by-1 weighting vector to satisfy the Quadratic Programming (QP) dual form:  The kernel avoids curse of dimensionality by projecting any two d-dimensional example vectors into feature space vectors returning their dot product:  Popular kernels include:  The exact KM output f(x) is computed via:

Some related work Early methods compressed a KM's SVs into a reduced set, in order to reduce the query time costs. When small ρ ≈ 0 can be achieved with n z «n speedups with little loss of classification accuracy have been reported. Problem: A key problem with all such reduced set approaches is that they do not provide any guarantees or control concerning how much classification error might be introduced by such approximations.

Methodology The intuition behind the NEW idea:  Order the SVs for each query using a distance metric and use the k nearest-neighboring (w.r.t the query sample) SVs. The largest terms tend to get added first.  During incremental computation of the KM, once the partial KM output leans “strong enough” either positively or negatively, it will not be able to completely change sign as remaining β i K(X i,x) terms are added.  Small k nearest-neighbor classifiers can often classify well, but that the best k will vary from query to query.

Methodology Nearest Support Vectors (NSV)  Let NSV’s distance like scoring be defined as: The β i K(Xi,x) terms corresponding to the NNscore-ordered SVs tend to follow a steady progression, such that soon the remaining terms become too small to overcome any strong leanings.

Methodology The main algorithm:

Methodology Statistical thresholds for NSV  Derive thresholds L k and H k by running the algorithm over a large representative sample of pre-query data.  Compute L k as the minimum value of g k (x) over all x such that g k (x) 0. This identifies L k as the worst-case wrong-way leaning of any sample that the exact KM classifies as positive. Similarly, H k is assigned the maximum g k (x) such that g k (x) > 0 and f(x) < 0.  In practice, the test and training data distributions will not be identical. We can replace each H k (L k ) with the maximum (minimum) of all threshold values over adjacent steps k-w through k+w (variation using a window w).

Methodology Sorting NSVs by NNscore i (x) leads to relatively wide and skewed thresholds whenever there is imbalance in the number of positive SVs versus negative SVs.  Adjusting the NNscore-based ordering so that the cumulative sums of the positive β and the negative β at each step k are as equal as possible. Full linear scan for searching the k-nearest neighbors can be very computationally expensive even when using indexing techniques.  Perform pre-query principal component analysis (PCA) on the matrix of SVs. Use these small k-dimensional vectors, to approximate kernels and to order NSVs for each Q as needed.

Methodology Some enhancements  Use a linear SVM as an initial filter. Compute the threshold bounds as before, except using the linear SVM’s output for the first step of the computation.  Generate additional “difficult” data in order to obtain better threshold levels from the representative sample.

Experiments and results Data: MNIST dataset (digit recognition) large input dimensionality large number of SVs

Experiments and results Speedup advantage compared to accuracy loss

Conclusions A new Kernel Machine at query time implementing a k nearest neighbor approach to improve performance. The approach is applicable to any form of Kernel Machine classifier, regardless of the way it is trained. Some exciting speedup results are reported without significant loss in accuracy. Future work toward combining the machine learning methods of kernels, nearest-neighbors and decision trees.

Any questions???.....THANK YOU!!!!