Nearest Neighbor Classifiers

Slides:



Advertisements
Similar presentations
Algorithm Analysis Input size Time I1 T1 I2 T2 …
Advertisements

FEATURE PERFORMANCE COMPARISON FEATURE PERFORMANCE COMPARISON y SC is a training set of k-dimensional observations with labels S and C b C is a parameter.
Chamfer Distance for Handshape Detection and Recognition CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.
November 12, 2013Computer Vision Lecture 12: Texture 1Signature Another popular method of representing shape is called the signature. In order to compute.
Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit.
Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.
Instructor: Mircea Nicolescu Lecture 13 CS 485 / 685 Computer Vision.
Chapter 2: Pattern Recognition
A Study of Approaches for Object Recognition
Segmentation Divide the image into segments. Each segment:
Project 4 out today –help session today –photo session today Project 2 winners Announcements.
3D Hand Pose Estimation by Finding Appearance-Based Matches in a Large Database of Training Views
Fitting a Model to Data Reading: 15.1,
Computer Vision I Instructor: Prof. Ko Nishino. Today How do we recognize objects in images?
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Hand Signals Recognition from Video Using 3D Motion Capture Archive Tai-Peng Tian Stan Sclaroff Computer Science Department B OSTON U NIVERSITY I. Introduction.
Tal Mor  Create an automatic system that given an image of a room and a color, will color the room walls  Maintaining the original texture.
October 14, 2014Computer Vision Lecture 11: Image Segmentation I 1Contours How should we represent contours? A good contour representation should meet.
Intelligent Vision Systems ENT 496 Object Shape Identification and Representation Hema C.R. Lecture 7.
September 23, 2014Computer Vision Lecture 5: Binary Image Processing 1 Binary Images Binary images are grayscale images with only two possible levels of.
Digital Image Processing CCS331 Relationships of Pixel 1.
1 E. Fatemizadeh Statistical Pattern Recognition.
CVPR2013 Poster Detecting and Naming Actors in Movies using Generative Appearance Models.
Fast Similarity Search in Image Databases CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.
Course14 Dynamic Vision. Biological vision can cope with changing world Moving and changing objects Change illumination Change View-point.
Over the recent years, computer vision has started to play a significant role in the Human Computer Interaction (HCI). With efficient object tracking.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
776 Computer Vision Jan-Michael Frahm Spring 2012.
April 21, 2016Introduction to Artificial Intelligence Lecture 22: Computer Vision II 1 Canny Edge Detector The Canny edge detector is a good approximation.
Another Example: Circle Detection
Time Series and Dynamic Time Warping
Machine Learning - Introduction
Clustering MacKay - Chapter 20.
Singular Value Decomposition, and Application to Recommender Systems
Histograms CSE 6363 – Machine Learning Vassilis Athitsos
Ananya Das Christman CS311 Fall 2016
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Mixtures of Gaussians and the EM Algorithm
Evaluating Techniques for Image Classification
Computer Vision Lecture 13: Image Segmentation III
Supervised Time Series Pattern Discovery through Local Importance
Lecture 26 Hand Pose Estimation Using a Database of Hand Images
Recognition: Face Recognition
Mean Shift Segmentation
Detecting Shapes in Cluttered Images
Dynamical Statistical Shape Priors for Level Set Based Tracking
Common Classification Tasks
Computer Vision Lecture 5: Binary Image Processing
Computer Vision Lecture 4: Color
Instance Based Learning (Adapted from various sources)
Fitting Curve Models to Edges
K Nearest Neighbor Classification
Hidden Markov Models Part 2: Algorithms
Computer Vision Lecture 16: Texture II
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Chapter III Modeling.
Object Recognition Today we will move on to… April 12, 2018
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
CS4670: Intro to Computer Vision
Nearest Neighbors CSC 576: Data Mining.
Announcements Project 2 artifacts Project 3 due Thursday night
Announcements Project 4 out today Project 2 winners help session today
Where are we? We have covered: Project 1b was due today
Announcements Artifact due Thursday
Announcements Artifact due Thursday
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Introduction to Artificial Intelligence Lecture 22: Computer Vision II
The “Margaret Thatcher Illusion”, by Peter Thompson
Presentation transcript:

Nearest Neighbor Classifiers CSE 6363 – Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

The Nearest Neighbor Classifier Let 𝕏 be the space of all possible patterns for some classification problem. Let 𝐹 be a distance function defined in 𝕏. 𝐹 assigns a distance to every pair 𝑣 1 , 𝑣 2 of objects in 𝕏. Examples of such a distance function 𝐹?

The Nearest Neighbor Classifier Let 𝕏 be the space of all possible patterns for some classification problem. Let 𝐹 be a distance function defined in 𝕏. 𝐹 assigns a distance to every pair 𝑣 1 , 𝑣 2 of objects in 𝕏. Examples of such a distance function 𝐹? The 𝐿 2 distance (also known as Euclidean distance, straight-line distance) for vector spaces. 𝐿 2 𝑣 1 , 𝑣 2 = 𝑖=1 𝐷 𝑣 1,𝑖 − 𝑣 2,𝑖 2 The 𝐿 1 distance (Manhattan distance) for vector spaces. 𝐿 1 𝑣 1 , 𝑣 2 = 𝑖=1 𝐷 𝑣 1,𝑖 − 𝑣 2,𝑖

The Nearest Neighbor Classifier Let 𝐹 be a distance function defined in 𝕏. 𝐹 assigns a distance to every pair 𝑣 1 , 𝑣 2 of objects in 𝕏. Let 𝑥 1 , 𝑥 2 ,…, 𝑥 𝑁 be training examples. The nearest neighbor classifier classifies any pattern 𝑣 as follows: Find the training example C(𝑣) that is the nearest neighbor of 𝑥 (has the shortest distance to v among all training data). Return the class label of C(𝑣). In short, each test pattern 𝑣 is assigned the class of its nearest neighbor in the training data. C(𝑣)= argmin 𝑥 ∈ 𝑥 1 , …, 𝑥 𝑁 𝐹 𝑣, 𝑥

Example Suppose we have a problem where: We have three classes (red, green, yellow). Each pattern is a two-dimensional vector. Suppose that the training data is given below: Suppose we have a test pattern 𝑣, shown in black. How is 𝑣 classified by the nearest neighbor classifier? y axis 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 x axis

Example Suppose we have a problem where: We have three classes (red, green, yellow). Each pattern is a two-dimensional vector. Suppose that the training data is given below: Suppose we have a test pattern 𝑣, shown in black. How is 𝑣 classified by the nearest neighbor classifier? C(𝑣): Class of C(𝑣): red. Therefore, 𝑣 is classified as red. y axis 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 x axis

Example Suppose we have a problem where: We have three classes (red, green, yellow). Each pattern is a two-dimensional vector. Suppose that the training data is given below: Suppose we have another test pattern 𝑣, shown in black. How is 𝑣 classified by the nearest neighbor classifier? y axis 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 x axis

Example Suppose we have a problem where: We have three classes (red, green, yellow). Each pattern is a two-dimensional vector. Suppose that the training data is given below: Suppose we have another test pattern 𝑣, shown in black. How is 𝑣 classified by the nearest neighbor classifier? C(𝑣): Class of C(𝑣): yellow. Therefore, 𝑣 is classified as yellow. y axis 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 x axis

Example Suppose we have a problem where: We have three classes (red, green, yellow). Each pattern is a two-dimensional vector. Suppose that the training data is given below: Suppose we have another test pattern 𝑣, shown in black. How is 𝑣 classified by the nearest neighbor classifier? y axis 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 x axis

Example Suppose we have a problem where: We have three classes (red, green, yellow). Each pattern is a two-dimensional vector. Suppose that the training data is given below: Suppose we have another test pattern 𝑣, shown in black. How is 𝑣 classified by the nearest neighbor classifier? C(𝑣): Class of C(𝑣): yellow. Therefore, 𝑣 is classified as yellow. y axis 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 x axis

The 𝐾-Nearest Neighbor Classifier Instead of classifying the test pattern based on its nearest neighbor, we can take more neighbors into account. This is called 𝑘-nearest neighbor classification. Using more neighbors can help avoid mistakes due to noisy data. In the example shown on the figure, the test pattern is in a mostly “yellow” area, but its nearest neighbor is red. If we use the 3 nearest neighbors, 2 of them are yellow.

Normalizing Dimensions Suppose that your test patterns are 2-dimensional vectors, representing stars. The first dimension is surface temperature, measured in Fahrenheit. Your second dimension is mass, measured in pounds. The surface temperature can vary from 6,000 degrees to 100,000 degrees. The mass can vary from 1029 to 1032. Does it make sense to use the Euclidean distance or the Manhattan distance here?

Normalizing Dimensions Suppose that your test patterns are 2-dimensional vectors, representing stars. The first dimension is surface temperature, measured in Fahrenheit. Your second dimension is mass, measured in pounds. The surface temperature can vary from 6,000 degrees to 100,000 degrees. The mass can vary from 1029 to 1032. Does it make sense to use the Euclidean distance or the Manhattan distance here? No. These distances treat both dimensions equally, and assume that they are both measured in the same units. Applied to these data, the distances would be dominated by differences in mass, and would mostly ignore information from surface temperatures.

Normalizing Dimensions It would make sense to use the Euclidean or Manhattan distance, if we first normalized dimensions, so that they contribute equally to the distance. How can we do such normalizations? There are various approaches. Two common approaches are: Translate and scale each dimension so that its minimum value is 0 and its maximum value is 1. Translate and scale each dimension so that its mean value is 0 and its standard deviation is 1.

Normalizing Dimensions – A Toy Example Original Data Normalized Data: Min = 0, Max = 1 Mean = 0, std = 1 Object ID Temp. (F) Mass (lb.) Temp. Mass 1 4700 1.5*1030 0.0000 0.0108 -0.9802 -0.6029 2 11000 3.5*1030 0.1525 0.0377 -0.5375 -0.5322 3 46000 7.5*1031 1.0000 1.9218 1.9931 4 12000 5.0*1031 0.1768 0.6635 -0.4673 1.1101 5 20000 7.0*1029 0.3705 0.0949 -0.6311 6 13000 2.0*1030 0.2010 0.0175 -0.3970 -0.5852 7 8500 8.5*1029 0.0920 0.0020 -0.7132 -0.6258 8 34000 1.5*1031 0.7094 0.1925 1.0786 -0.1260

Scaling to Lots of Data Nearest neighbor classifiers become more accurate as we get more and more training data. Theoretically, one can prove that 𝑘-nearest neighbor classifiers become Bayes classifiers (and thus optimal) as training data approaches infinity. One big problem, as training data becomes very large, is time complexity. What takes time? Computing the distances between the test pattern and all training examples.

Nearest Neighbor Search The problem of finding the nearest neighbors of a pattern is called “nearest neighbor search”. Suppose that we have 𝑁 training examples. Suppose that each example is a 𝐷-dimensional vector. What is the time complexity of finding the nearest neighbors of a test pattern?

Nearest Neighbor Search The problem of finding the nearest neighbors of a pattern is called “nearest neighbor search”. Suppose that we have 𝑁 training examples. Suppose that each example is a 𝐷-dimensional vector. What is the time complexity of finding the nearest neighbors of a test pattern? 𝑂(𝑁𝐷). We need to consider each dimension of each training example. This complexity is linear, and we are used to thinking that linear complexity is not that bad. If we have millions, or billions, or trillions of data, the actual time it takes to find nearest neighbors can be a big problem for real-world applications.

Indexing Methods As we just mentioned, measuring the distance between the test pattern and each training example takes 𝑂(𝑁𝐷) time. This method of finding nearest neighbors is called “brute-force search”, because we go through all the training data. There are methods for finding nearest neighbors that are sublinear to 𝑁 (even logarithmic, at times), but exponential to 𝐷. Can you think of an example?

Indexing Methods As we just mentioned, measuring the distance between the test pattern and each training example takes 𝑂(𝑁𝐷) time. This method of finding nearest neighbors is called “brute-force search”, because we go through all the training data. There are methods for finding nearest neighbors that are sublinear to 𝑁 (even logarithmic, at times), but exponential to 𝐷. Can you think of an example? Binary search (applicable when 𝐷=1) takes 𝑂(log⁡𝑁) time.

Indexing Methods In some cases, faster algorithms exist that, however, are approximate. They do not guarantee finding the true nearest neighbor all the time. They guarantee that they find the true nearest neighbor with a certain probability.

The Curse of Dimensionality The “curse of dimensionality” is a commonly used term in artificial intelligence. Common enough that it has a dedicated Wikipedia article. It is actually not a single curse, it shows up in many different ways. The curse consists of the fact that lots of AI methods are very good at handling low-dimensional data, but horrible at handling high-dimensional data. The nearest neighbor problem is an example of this curse. Finding nearest neighbors in low dimensions (like 1, 2, 3 dimensions) can be done in close to logarithmic time. However, these approaches take time exponential to D. By the time you get to 50, 100, 1000 dimensions, they get hopeless. Data oftentimes has thousands or millions of dimensions.

More Exotic Distance Measures The Euclidean and Manhattan distance are the most commonly used distance measures. However, in some cases they do not make much sense, or they cannot even be applied. In such cases, more exotic distance measures can be used. A few examples (this is not required material, it is just for your reference): The edit distance for strings (hopefully you’ve seen it in your algorithms class). Dynamic time warping for time series. Bipartite matching for sets.

Case Study: Hand Pose Estimation Hand pose is defined by: Handshape: the position, orientation, and shape of each finger. 3D orientation: the direction that the hand points to. input image Hand pose: Hand shape 3D hand orientation.

Hand Shapes Handshapes are specified by the joint angles of the fingers. Hands are very flexible. Hundreds or thousands of distinct hand shapes.

3D Hand Orientation Images of the same handshape can look VERY different. Appearance depends on the 3D orientation of the hand with respect to the camera. Here are some examples of the same shape seen under different orientations.

Hand Pose Estimation: Applications There are several applications of hand pose estimation (if the estimates are sufficiently accurate, which is a big if): Sign language recognition. Human-computer interfaces (controlling applications and games via gestures). Clinical applications (studying the motion skills of children, patients…). input image Hand pose: Hand shape 3D hand orientation.

Hand Pose Estimation Hand pose: Hand shape 3D hand orientation. There are many different computer vision methods for estimating hand pose. The performance of these methods (so far) is much lower to that of the human visual system. However, performance can be good enough for specific applications. Hand pose: Hand shape 3D hand orientation.

Hand Pose Estimation Hand pose: Hand shape 3D hand orientation. We will take a look at a simple method, based on nearest neighbor classification. This method is described in more detail in this paper: Vassilis Athitsos and Stan Sclaroff, "An Appearance-Based Framework for 3D Hand Shape Classification and Camera Viewpoint Estimation". IEEE Conference on Automatic Face and Gesture Recognition, 2002. http://vlm1.uta.edu/~athitsos/publications/bucs-2001-022.pdf Hand pose: Hand shape 3D hand orientation.

Preprocessing: Hand Detection This specific nearest-neighbor method assumes that, somehow, the input image can be preprocessed, so as to: Detect the location of the hand. Segment the hand (i.e., separate the hand pixels from the background pixels). segmented hand

Preprocessing: Hand Detection In the general case, hand detection and segmentation are in themselves very challenging tasks. However, in images like the example input image shown here, standard computer vision methods work pretty well. You will learn to implement such methods if you take the computer vision class. input image segmented hand

Problem Definition Assuming that the hand has been segmented, our task is to learn a function, that maps the segmented hand image to the correct hand pose. segmented hand Hand pose: Hand shape 3D hand orientation.

Problem Definition: Hand Shapes We will train a system that only recognizes some selected handshapes (for example, 20-30 specific handshapes). This means that the system will not recognize arbitrary handshapes. A few tens of shapes, however, is sufficient for some applications, like sign language recognition or using hands to control games and applications.

Problem Definition: Orientations We will allow the hand to have any 3D orientation whatsoever. Our goal is to be able to recognize the handshape under any orientation. The system will estimate both the handshape and the 3D orientation.

Training Set The training set is generated using computer graphics software. 26 hand shapes. 4000 orientations for each shape. Over 100,000 images in total.

Euclidean Distance Works Poorly Suppose we want to use nearest neighbors classification, and we choose the Euclidean Distance as the distance measure. The Euclidean distance between two images compares the color values of the two images in each pixel location. Color is not a reliable feature. It depends on many things, like skin color, lighting conditions, shadows…

Converting to Edge Images Edge images: images where edge pixels are white, and everything else is black. Roughly speaking, edge pixels are boundary pixels, having significantly different color than some of their neighbors. One of the first topics in any computer vision course is how to compute edge images. original image edge image

Converting to Edge Images Edge images are more stable than color images. They still depend somewhat on things like skin color, illumination, shadows, but not as much as color images. original image edge image

Euclidean Distance on Edge Images test image superimposed on training image test image training image The Euclidean distance also works poorly on edge images. The distance between two edge images depends entirely on the number of pixel locations where one image has an edge pixel and the other image does not have an edge pixel. Even two nearly identical edge images can have a large distance, if they are slightly misaligned. Edge image is from hand1.avi, frame 0, saved in green_edges.bmpseq Model images are: The correct one is by calling ground_match 2 on the input image (frame 420, view 2), Saved in red_edges2.bmpseq Other database edges saved in red_edges.bmpseq are: (431, 3), 16096, 9894, 26643, unknown, 25269

Euclidean Distance on Edge Images test image superimposed on training image test image training image For example, consider the test image superimposed on the training image: Even though the two images look similar, the edge pixel locations do not match. The Euclidean distance is high, and does not capture the similarity between those two images. Edge image is from hand1.avi, frame 0, saved in green_edges.bmpseq Model images are: The correct one is by calling ground_match 2 on the input image (frame 420, view 2), Saved in red_edges2.bmpseq Other database edges saved in red_edges.bmpseq are: (431, 3), 16096, 9894, 26643, unknown, 25269

test image superimposed on training image Chamfer Distance test image superimposed on training image test image training image In the chamfer distance, each edge image is simply represented as a set of points (pixel locations). This set contains the pixel location of every edge pixel in that image. The pixel location is just two numbers: row and column. How can we define a distance between two sets of points? Edge image is from hand1.avi, frame 0, saved in green_edges.bmpseq Model images are: The correct one is by calling ground_match 2 on the input image (frame 420, view 2), Saved in red_edges2.bmpseq Other database edges saved in red_edges.bmpseq are: (431, 3), 16096, 9894, 26643, unknown, 25269

Directed Chamfer Distance Input: two sets of points. 𝑅 is the set of red points. 𝐺 is the set of red points. For each red point 𝑃 𝑟 : Identify the green point 𝑃 𝑔 that is the closest to 𝑃 𝑟 . The directed chamfer distance 𝑐(𝑅, 𝐺) is the average distance from each red point to its nearest green point. For the two sets shown on the figure: Each red point is connected by a link to its nearest green point. 𝑐(𝑅, 𝐺) is the average length of those links.

Directed Chamfer Distance Input: two sets of points. 𝑅 is the set of red points. 𝐺 is the set of red points. For each green point 𝑃 𝑔 : Identify the red point 𝑃 𝑟 that is the closest to 𝑃 𝑔 . The directed chamfer distance 𝑐(𝐺, 𝑅) is the average distance from each green point to its nearest red point. For the two sets shown on the figure: Each green point is connected by a link to its nearest red point. 𝑐(𝐺, 𝑅) is the average length of those links.

Chamfer Distance Input: two sets of points. 𝑐(𝑅, 𝐺): 𝑐(𝐺, 𝑅): 𝑅 is the set of red points. 𝐺 is the set of red points. 𝑐(𝑅, 𝐺): Average distance from each red point to nearest green point. 𝑐(𝐺, 𝑅): Average distance from each red point to nearest green point. The chamfer distance 𝐶 𝑅, 𝐺 is simply the sum of the two directed chamfer distances 𝑐(𝑅, 𝐺) and 𝑐(𝐺, 𝑅): 𝑪 𝑹, 𝑮 =𝒄 𝑹, 𝑮 +𝒄(𝑮, 𝑹)

Chamfer Distance Example 𝑅= 12, 5 , 8, 6 , 5,13 𝐺= 5,7 , 6,11 First, we compute the directed distance 𝑐(𝑅, 𝐺): For 12, 5 , the closest point in 𝐺 is 5, 7 . Distance: (12−5) 2 + (5−7) 2 =7.28. For 8, 6 , the closest point in 𝐺 is 5, 7 . Distance: (8−5) 2 + (6−7) 2 =3.16. For 5, 13 , the closest point in 𝐺 is 6, 11 . Distance: (5−6) 2 + (13−11) 2 =2.24. 𝑐 𝑅, 𝐺 = 7.28+3.16+2.24 3 =4.23.

Chamfer Distance Example 𝑅= 12, 5 , 8, 6 , 5,13 𝐺= 5,7 , 6,11 Second, we compute the directed distance 𝑐(𝐺, 𝑅): For 5, 7 , the closest point in 𝑅 is 8, 6 . Distance: (5−8) 2 + (7−6) 2 =3.16. For 6, 11 , the closest point in 𝑅 is 5, 13 . Distance: (6−5) 2 + (11−13) 2 =2.24. 𝑐 𝑅, 𝐺 = 3.16+2.24 2 =2.70.

Chamfer Distance Example 𝑅= 12, 5 , 8, 6 , 5,13 𝐺= 5,7 , 6,11 Third, we compute the undirected chamfer distance 𝐶(𝑅, 𝐺): The undirected chamfer distance is the sum of the two directed distances. 𝐶 𝑅, 𝐺 =𝑐(𝑅,𝐺)+𝑐(𝐺,𝑅)=4.23+2.70=6.93

Efficiency of the Chamfer Distance test image training image For every edge pixel in one image we must find the distance to the closest edge pixel in the other image. Suppose that our two images have approximately 𝐷 edge pixels. It takes 𝑂( 𝑑 2 ) to measure the distance between every edge pixel in one image and every edge pixel in the other image. There are more efficient algorithms, taking 𝑂(𝑑 log 𝑑) time.

Other Non-Euclidean Distance Measures Bipartite matching. Like the chamfer distance, bipartite matching can be used as a distance measure for sets of points. It finds the optimal 1-1 correspondence between points of one set and points of the other set. Time complexity: cubic to the number of points. Kullback-Leibler distance. It is a distance measure for probability distributions.

Distance/Similarity Measures for Strings Edit distance. The number of the fewest insert/delete/substitute operations that convert one string to another string. Smith-Waterman. A similarity measure for strings, high values indicate higher similarity. Here, the “nearest neighbor” would be the training string with the highest Smith-Waterman score with respect to the test string. Longest Common Subsequence (LCSS).

Distance Measures for Time Series Dynamic Time Warping (DTW). We will talk about that in more detail. Edit Distance with Real Penalty (ERP). Time Warping Edit Distance (TWED). Move-Split-Merge (MSM).

Nearest Neighbors: Learning What is the "training stage" for a nearest neighbor classifier?

Nearest Neighbors: Learning What is the "training stage" for a nearest neighbor classifier? In the simplest case, none. Once we have a training set, and we have chosen a distance measure, no other training is needed.

Nearest Neighbors: Learning What is the "training stage" for a nearest neighbor classifier? In the simplest case, none. Once we have a training set, and we have chosen a distance measure, no other training is needed. There are things that can be learned, if desired: Choosing elements for the training set. Condensing is a method that removes training objects that actually hurt classification accuracy. Parameters of the distance measure. For example, if we use the Euclidean distance, the weight of each dimension (or a projection function, mapping data to a new space) can be learned.

Nearest Neighbor Classification: Recap Nearest neighbor classifiers can be pretty simple to implement. Just pick a training set and a distance measure. They become increasingly accurate as we get more training examples. Nearest neighbor classifiers can be defined even if we have only one training example per class. Most methods require significantly more data.

Nearest Neighbor Classification: Recap In Euclidean spaces, normalizing the values in each dimension may be necessary, before measuring distances. Or, alternatively, learning optimal weights for each dimension. Non-Euclidean distance measures are defined for non-Euclidean spaces (edge images, strings, time series, probability distributions), Finding nearest neighbors in high-dimensional spaces, or non-Euclidean spaces, can be very, very slow, when we have lots of training data.