Nearest Neighbour Condensing and Editing David Claus February 27, 2004 Computer Vision Reading Group Oxford.

Slides:



Advertisements
Similar presentations
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Advertisements

PARTITIONAL CLUSTERING
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
 Distance Problems: › Post Office Problem › Nearest Neighbors and Closest Pair › Largest Empty and Smallest Enclosing Circle  Sub graphs of Delaunay.
Data Mining Classification: Alternative Techniques
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Instance Based Learning
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Classification and Decision Boundaries
Data Structures and Functional Programming Algorithms for Big Data Ramin Zabih Cornell University Fall 2012.
Discriminative and generative methods for bags of features
Computer Vision Group University of California Berkeley Shape Matching and Object Recognition using Shape Contexts Jitendra Malik U.C. Berkeley (joint.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
CS292 Computational Vision and Language Pattern Recognition and Classification.
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
A Study of the Relationship between SVM and Gabriel Graph ZHANG Wan and Irwin King, Multimedia Information Processing Laboratory, Department of Computer.
CS Instance Based Learning1 Instance Based Learning.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Radial-Basis Function Networks
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Module 04: Algorithms Topic 07: Instance-Based Learning
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
Algorithms for Triangulations of a 3D Point Set Géza Kós Computer and Automation Research Institute Hungarian Academy of Sciences Budapest, Kende u
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Non-Parameter Estimation 主講人:虞台文. Contents Introduction Parzen Windows k n -Nearest-Neighbor Estimation Classification Techiques – The Nearest-Neighbor.
David Claus and Christoph F. Eick: Nearest Neighbor Editing and Condensing Techniques Nearest Neighbor Editing and Condensing Techniques 1.Nearest Neighbor.
MOSAIC: A Proximity Graph Approach for Agglomerative Clustering Jiyeon Choo, Rachsuda Jiamthapthaksin, Chun-shen Chen, Ulvi Celepcikay, Christian Guisti,
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
1 Eick, Zeidat, Vilalta: Using Representative-based Clustering for NN Dataset Editing (ICDM04) Using Representative-Based Clustering For Nearest Neighbour.
Handwritten digit recognition
Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.
On the Role of Dataset Complexity in Case-Based Reasoning Derek Bridge UCC Ireland (based on work done with Lisa Cummins)
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 6: Nearest and k-nearest Neighbor Classification.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Vector Quantization CAP5015 Fall 2005.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
CS Machine Learning Instance Based Learning (Adapted from various sources)
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
KNN & Naïve Bayes Hongning Wang
k-Nearest neighbors and decision tree
Non-Parameter Estimation
Data Science Algorithms: The Basic Methods
Lecture 05: K-nearest neighbors
Non-parametric Density Estimation Chapter 4 (Duda et al.)
CAMCOS Report Day December 9th, 2015 San Jose State University
K Nearest Neighbor Classification
HC-edit: A Hierarchical Clustering Approach To Data Editing
Nearest-Neighbor Classifiers
COSC 4335: Other Classification Techniques
Nearest Neighbors CSC 576: Data Mining.
Lecture 03: K-nearest neighbors
Data Mining Classification: Alternative Techniques
ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.
Presentation transcript:

Nearest Neighbour Condensing and Editing David Claus February 27, 2004 Computer Vision Reading Group Oxford

Nearest Neighbour Rule Non-parametric pattern classification. Consider a two class problem where each sample consists of two measurements (x,y). k = 1 k = 3 For a given query point q, assign the class of the nearest neighbour. Compute the k nearest neighbours and assign the class by majority vote.

Example: Digit Recognition Yann LeCunn – MNIST Digit Recognition –Handwritten digits –28x28 pixel images: d = 784 –60,000 training samples –10,000 test samples Nearest neighbour is competitive Test Error Rate (%) Linear classifier (1-layer NN)12.0 K-nearest-neighbors, Euclidean5.0 K-nearest-neighbors, Euclidean, deskewed2.4 K-NN, Tangent Distance, 16x161.1 K-NN, shape context matching RBF + linear classifier3.6 SVM deg 4 polynomial1.1 2-layer NN, 300 hidden units4.7 2-layer NN, 300 HU, [deskewing]1.6 LeNet-5, [distortions]0.8 Boosted LeNet-4, [distortions]0.7

Nearest Neighbour Issues Expensive –To determine the nearest neighbour of a query point q, must compute the distance to all N training examples +Pre-sort training examples into fast data structures (kd-trees) +Compute only an approximate distance (LSH) +Remove redundant data (condensing) Storage Requirements –Must store all training data P +Remove redundant data (condensing) -Pre-sorting often increases the storage requirements High Dimensional Data –“Curse of Dimensionality” Required amount of training data increases exponentially with dimension Computational cost also increases dramatically Partitioning techniques degrade to linear search in high dimension

Questions What distance measure to use? –Often Euclidean distance is used –Locally adaptive metrics –More complicated with non-numeric data, or when different dimensions have different scales Choice of k? –Cross-validation –1-NN often performs well in practice –k-NN needed for overlapping classes –Re-label all data according to k-NN, then classify with 1-NN –Reduce k-NN problem to 1-NN through dataset editing

Exact Nearest Neighbour Asymptotic error (infinite sample size) is less than twice the Bayes classification error –Requires a lot of training data Expensive for high dimensional data (d>20?) O(Nd) complexity for both storage and query time –N is the number of training examples, d is the dimension of each sample –This can be reduced through dataset editing/condensing

Decision Regions Each cell contains one sample, and every location within the cell is closer to that sample than to any other sample. A Voronoi diagram divides the space into such cells. Every query point will be assigned the classification of the sample within that cell. The decision boundary separates the class regions based on the 1-NN decision rule. Knowledge of this boundary is sufficient to classify new points. The boundary itself is rarely computed; many algorithms seek to retain only those points necessary to generate an identical boundary.

Condensing Aim is to reduce the number of training samples Retain only the samples that are needed to define the decision boundary This is reminiscent of a Support Vector Machine Decision Boundary Consistent – a subset whose nearest neighbour decision boundary is identical to the boundary of the entire training set Minimum Consistent Set – the smallest subset of the training data that correctly classifies all of the original training data Original dataCondensed dataMinimum Consistent Set

Condensing Condensed Nearest Neighbour (CNN) Hart 1968 –Incremental –Order dependent –Neither minimal nor decision boundary consistent –O(n 3 ) for brute-force method –Can follow up with reduced NN [Gates72] Remove a sample if doing so does not cause any incorrect classifications 1.Initialize subset with a single training example 2.Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset 3.Return to 2 until no transfers occurred or the subset is full

Condensing Condensed Nearest Neighbour (CNN) Hart 1968 –Incremental –Order dependent –Neither minimal nor decision boundary consistent –O(n 3 ) for brute-force method –Can follow up with reduced NN [Gates72] Remove a sample if doing so does not cause any incorrect classifications 1.Initialize subset with a single training example 2.Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset 3.Return to 2 until no transfers occurred or the subset is full

Condensing Condensed Nearest Neighbour (CNN) Hart 1968 –Incremental –Order dependent –Neither minimal nor decision boundary consistent –O(n 3 ) for brute-force method –Can follow up with reduced NN [Gates72] Remove a sample if doing so does not cause any incorrect classifications 1.Initialize subset with a single training example 2.Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset 3.Return to 2 until no transfers occurred or the subset is full

Condensing Condensed Nearest Neighbour (CNN) Hart 1968 –Incremental –Order dependent –Neither minimal nor decision boundary consistent –O(n 3 ) for brute-force method –Can follow up with reduced NN [Gates72] Remove a sample if doing so does not cause any incorrect classifications 1.Initialize subset with a single training example 2.Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset 3.Return to 2 until no transfers occurred or the subset is full

Condensing Condensed Nearest Neighbour (CNN) Hart 1968 –Incremental –Order dependent –Neither minimal nor decision boundary consistent –O(n 3 ) for brute-force method –Can follow up with reduced NN [Gates72] Remove a sample if doing so does not cause any incorrect classifications 1.Initialize subset with a single training example 2.Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset 3.Return to 2 until no transfers occurred or the subset is full

Condensing Condensed Nearest Neighbour (CNN) Hart 1968 –Incremental –Order dependent –Neither minimal nor decision boundary consistent –O(n 3 ) for brute-force method –Can follow up with reduced NN [Gates72] Remove a sample if doing so does not cause any incorrect classifications 1.Initialize subset with a single training example 2.Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset 3.Return to 2 until no transfers occurred or the subset is full

Condensing Condensed Nearest Neighbour (CNN) Hart 1968 –Incremental –Order dependent –Neither minimal nor decision boundary consistent –O(n 3 ) for brute-force method –Can follow up with reduced NN [Gates72] Remove a sample if doing so does not cause any incorrect classifications 1.Initialize subset with a single training example 2.Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset 3.Return to 2 until no transfers occurred or the subset is full

Proximity Graphs Condensing aims to retain points along the decision boundary How to identify such points? –Neighbouring points of different classes Proximity graphs provide various definitions of “neighbour” NNG = Nearest Neighbour Graph MST = Minimum Spanning Tree RNG = Relative Neighbourhood Graph GG = Gabriel Graph DT = Delaunay Triangulation

Proximity Graphs: Delaunay The Delaunay Triangulation is the dual of the Voronoi diagram Three points are each others neighbours if their tangent sphere contains no other points Voronoi editing: retain those points whose neighbours (as defined by the Delaunay Triangulation) are of the opposite class The decision boundary is identical Conservative subset Retains extra points Expensive to compute in high dimensions

Proximity Graphs: Gabriel The Gabriel graph is a subset of the Delaunay Triangulation Points are neighbours only if their (diametral) sphere of influence is empty Does not preserve the identical decision boundary, but most changes occur outside the convex hull of the data points Can be computed more efficiently Green lines denote “Tomek links”

Proximity Graphs: RNG The Relative Neighbourhood Graph (RNG) is a subset of the Gabriel graph Two points are neighbours if the “lune” defined by the intersection of their radial spheres is empty Further reduces the number of neighbours Decision boundary changes are often drastic, and not guaranteed to be training set consistent Gabriel edited RNG edited – not consistent

Matlab demo

Dataset Reduction: Editing Training data may contain noise, overlapping classes –starting to make assumptions about the underlying distributions Editing seeks to remove noisy points and produce smooth decision boundaries – often by retaining points far from the decision boundaries Results in homogenous clusters of points

Wilson Editing Wilson 1972 Remove points that do not agree with the majority of their k nearest neighbours Wilson editing with k=7 Original data Earlier example Wilson editing with k=7 Original data Overlapping classes

Multi-edit Multi-edit [Devijer & Kittler ’79] –Repeatedly apply Wilson editing to random partitions –Classify with the 1-NN rule Approximates the error rate of the Bayes decision rule 1.Diffusion: divide data into N ≥ 3 random subsets 2.Classification: Classify S i using 1-NN with S (i+1)Mod N as the training set (i = 1..N) 3.Editing: Discard all samples incorrectly classified in (2) 4.Confusion: Pool all remaining samples into a new set 5.Termination: If the last I iterations produced no editing then end; otherwise go to (1) Multi-edit, 8 iterations – last 3 same Voronoi editing

Combined Editing/Condensing First edit the data to remove noise and smooth the boundary Then condense to obtain a smaller subset

Where are we? Simple method, pretty powerful rule Can be made to run fast Requires a lot of training data Edit to reduce noise, class overlap Condense to remove redundant data

Questions What distance measure to use? –Often Euclidean distance is used –Locally adaptive metrics –More complicated with non-numeric data, or when different dimensions have different scales Choice of k? –Cross-validation –1-NN often performs well in practice –k-NN needed for overlapping classes –Re-label all data according to k-NN, then classify with 1-NN –Reduce k-NN problem to 1-NN through dataset editing