Nearest-Neighbor Classifiers Sec 4.7
5 minutes of math... Definition: a metric function is a function that obeys the following properties: Identity: Symmetry: Triangle inequality:
5 minutes of math... Examples: Euclidean distance * Note: omitting the square root still yields a metric and usually won’t change our results
5 minutes of math... Examples: Manhattan (taxicab) distance Distance travelled along a grid between two points No diagonals allowed
5 minutes of math... Examples: What if some attribute is categorical? Typical answer is 0/1 distance: For each attribute, add 1 if the instances differ in that attribute, else 0 (To make Daniel happy: for (i=0;i<xa.length;++i) { d+=(xa[i]!=xb[i]) ? 1 : 0; } )
Distances in classification Nearest neighbor: find the nearest instance to the query point in feature space, return the class of that instance Simplest possible distance-based classifier With more notation: Distance function is anything appropriate to your data
Properties of NN Training time of NN? Classification time? Geometry of model? d(, ) Closer to
Properties of NN Training time of NN? Classification time? Geometry of model?
Properties of NN Training time of NN? Classification time? Geometry of model?
Eventually...
NN miscellaney Slight generalization: k -Nearest neighbors ( k - NN) Find k training instances closest to query point Vote among them for label Q: How does this affect system? Q: Why does it work?
Geometry of k -NN d (7) Query point
Exercise Show that k-NN does something reasonable: Assume binary data Let X be query point, X’ be any k -neighbor of X Let p=Pr[Class(X’)==Class(X)] ( p>1/2 ) What is Pr[X receives correct label] ? What happens as k grows? But there are tradeoffs... Let V(k,N)=volume of sphere enclosing k neighbors of X, assuming N points in data set For fixed N, what happens to V(k,N) as k grows? For fixed k, what happens to V(k,N) as N grows? What about radius of V(k,N) ?
Excercise What is Pr[X receives correct label] ?
Excercise What happens as k →∞? Theorem: in the limit of large k, the binomial distribution is well approximated by the Gaussian:
Excercise So:
NN miscellaney Gotcha: unscaled dimensions What happens if one axis is measured in microns and one in lightyears? Usual trick is to scale each axis to [0,1] range (Sometimes [-1,1] is useful as well)