Download presentation
Presentation is loading. Please wait.
1
Instance Based Learning IB1 and IBK Small section in chapter 20
2
1- Nearest Neighbor Basic distance function between attribute- values –If real, the absolute value –If nominal, d(v1,v2) = 1 if v1 \=v2, else 0. Distance between 2 instances is square root of sum of square. Usually normalize real-value distances for fairness amongst attributes.
3
Prediction For instance x, let y be closest instance to x in training set. Predict class x is the class of y. On some data sets, best algorithm.
4
Voronoi Diagram
5
For each point, draw the boundary of all points closest to it. Each point’s sphere of influence in convex. If noisy, can be bad. http://www.cs.cornell.edu/Info/People/chew /Delaunay.html - nice applet.http://www.cs.cornell.edu/Info/People/chew /Delaunay.html
6
Problems and solutions Noise –Remove bad examples –Use voting Bad distance measure –Use probability class vector Memory –Remove unneeded examples
7
Voting schemes K nearest neighbor –Let all the closest k neighbors vote (use k odd) Kernel K(x,y) – a similarity function –Let everyone vote, with decreasing weight according to K(x,y) –Ex: K(x,y) = e^(-distance(x,y)^2) –Ex. K(x,y) = inner product of x and y –Ex K(x,y) = inner product of f(x) and f(y) where f is some mapping of x and y into R^n.
8
Choosing the parameter K Divide data into train and test Run multiple values of k on train Choose k that does best on test. NOT – you have used test to data to pick the k.
9
Internal Cross-validation This can be used for selecting any parameter. Divide Data into Train and Test. Now do 10-fold CV on the training data to determine the appropriate value of k. Note: never touch the test data.
10
Probability Class Vector Let A be an attribute with values v1, v2,..vn Suppose class C1,C2,..Ck Prob Class Vector for vi is: Distance(vi,vj) = distance between probabiltiy class vectors.
11
PCV If an attribute is irrelevant and v and v’ are values, then PCV(v) ~ PCV(v’) so the distance will be close to 0. This discounts irrelevant attributes. It also works for real-attributes, after binning. Binning is a way to make real-values symbolic. Simple break data into k bins, eg. K = 5 or 10 seems to work. Or use DTs.
12
Regression by NN If 1-NN, use value of nearest example If k-nn, interpolate values of k nearest neighbors. Kernel methods work to. You avoid choice of k, but hide it in choice of kernel function.
13
Summary NN works for multi-class and regression. Sometimes called “poor man’s neural net’’ With enough data, it achieves ½ the “bayes optimal” error rate. Mislead by bad examples and bad features. Separates classes via piecewise linear boundaries.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.