Download presentation
Presentation is loading. Please wait.
Published byLauryn Louth Modified over 9 years ago
1
Instance Based Learning IB1 and IBK Find in text Early approach
2
1- Nearest Neighbor Basic distance function between attribute-values –If real, the absolute value –If nominal, d(v1,v2) = 1 if v1 \=v2, else 0. Distance between 2 instances is square root of sum of squares, i.e. euclidean distance –Square root of sum of squares –Sqrt( (x1-y1)^2 +….(xn-yn)^2 ) May normalize real-value distances for fairness amongst attributes.
3
Prediction or classification For instance x, let y be closest instance to x in training set. Predict class x is the class of y. On some data sets, best algorithm. In general, no best learning algorithm.
4
Voronoi Diagram
5
For each point, draw the boundary of all points closest to it. Each point’s sphere of influence in convex. If noisy, can be bad. http://www.cs.cornell.edu/Info/People/chew /Delaunay.html - nice applet.http://www.cs.cornell.edu/Info/People/chew /Delaunay.html
6
Problems and solutions Noise –Remove bad examples –Use voting Bad distance measure –Use probability class vector Memory –Remove unneeded examples
7
Voting schemes K nearest neighbor –Let all the closest k neighbors vote (use k odd) Kernel K(x,y) – a similarity function –Let everyone vote, with decreasing weight according to K(x,y) –Ex: K(x,y) = e^(-distance(x,y)^2) –Ex. K(x,y) = inner product of x and y –Ex K(x,y) = inner product of f(x) and f(y) where f is some mapping of x and y into R^n.
8
Choosing the parameter K Divide data into train and test Run multiple values of k on train Choose k that does best on test.
9
NOT This is a serious methological error You have used test data to pick the k. Common in commercial evaluation of systems Occasional in academic papers
10
Fix: Internal Cross-validation This can be used for selecting any parameter. Divide Data into Train and Test. Now do 10-fold CV on the training data to determine the appropriate value of k. Note: never touch the test data.
11
Probability Class Vector Let A be an attribute with values v1, v2,..vn Suppose class C1,C2,..Ck Prob Class Vector for vi is: Distance(vi,vj) = distance between probability class vectors.
12
Weather data @relation weather @attribute outlook {sunny, overcast, rainy} @attribute temperature real @attribute humidity real @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,85,85,FALSE,no sunny,80,90,TRUE,no overcast,83,86,FALSE,yes rainy,70,96,FALSE,yes rainy,68,80,FALSE,yes rainy,65,70,TRUE,no overcast,64,65,TRUE,yes sunny,72,95,FALSE,no sunny,69,70,FALSE,yes rainy,75,80,FALSE,yes sunny,75,70,TRUE,yes overcast,72,90,TRUE,yes overcast,81,75,FALSE,yes rainy,71,91,TRUE,no
13
Distance(sunny,rainy) =? = = prob class vector for sunny = Distance(sunny,rainy) = 1/5*sqrt(2). Similarly: distance(sunny,overcast) = d(, ) = 2/5*sqrt(2)
14
PCV If an attribute is irrelevant and v and v’ are values, then PCV(v) ~ PCV(v’) so the distance will be close to 0. This discounts irrelevant attributes. It also works for real-attributes, after binning. Binning is a way to make real-values symbolic. Simple break data into k bins, eg. K = 5 or 10 seems to work. Or use DTs.
15
Regression by NN If 1-NN, use value of nearest example If k-nn, interpolate values of k nearest neighbors. Kernel methods work to. You avoid choice of k, but hide it in choice of kernel function.
16
Summary NN works for multi-class and regression. Sometimes called “poor man’s neural net’’ With enough data, it achieves ½ the “bayes optimal” error rate. Mislead by bad examples and bad features. Separates classes via piecewise linear boundaries.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.