1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning 4.1 Introduction Instance-Based Learning: Local approximation to the target function that applies in the neighborhood of the query instance –Cost of classifying new instances can be high: Nearly all computations take place at classification time –Examples: k-Nearest Neighbors –Radial Basis Functions: Bridge between instance-based learning and artificial neural networks
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning K-Nearest Neighbors
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning Most plausible hypothesis
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning Now? Or maybe…
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning Is the simplest hypothesis always the best one?
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning 4.2 k-Nearest Neighbor Learning Instance x = [ a 1 (x), a 2 (x),..., a n (x) ] n d(x i,x j ) = [ (x i -x j ).(x i -x j ) ] ½ = Euclidean Distance –Discrete-Valued Target Functions ƒ : n V = {v 1, v 2,.., v s )
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning Prediction for a new query x: (k nearest neighbors of x) ƒ(x) = argmax v V i=1,k [v,ƒ(x i )] [v,ƒ(x i )] = 1 if v =ƒ(x i ), [v,ƒ(x i )] = 0 otherwise –Continuous-Valued Target Functions ƒ(x) = (1/k) i=1,k ƒ(x i )
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning Distance-Weighted k-NN ƒ(x) = argmax v V i=1,k w i [v,ƒ(x i )] ƒ(x) = i=1,k w i ƒ(x i ) / k i=1,k w i w i = [d(x i,x)] -2 Weights more heavily closest neighbors
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning Remarks for k-NN –Robust to noise –Quite effective for large training sets –Inductive bias: The classification of an instance will be most similar to the classification of instances that are nearby in Euclidean distance –Especially sensitive to the curse of dimensionality –Elimination of irrelevant attributes by suitably chosen the metric: d(x i,x j ) = [ (x i -x j ).G.(x i -x j ) ] ½
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning 4.3 Locally Weighted Regression Builds an explicit approximation to ƒ(x) over a local region surrounding x (usually a linear or quadratic fit to training examples nearest to x) Locally Weighted Linear Regression: ƒ L (x) = w 0 + w 1 x w n x n E(x) = i=1,k [ƒ L (x i )-ƒ(x i )] 2 (x i nn of x)
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning Generalization: ƒ L (x) = w 0 + w 1 x w n x n E(x) = i=1,N K[ d(x i,x)] [ƒ L (x i )-ƒ(x i )] 2 K[ d(x i,x)] = kernel function Other possibility: ƒ Q (x) = quadratic function of x j
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning 4.4 Radial Basis Functions Approach closely related to distance-weighted regression and artificial neural network learning ƒ RBF (x) = w 0 + µ=1,k w µ K[ d(x µ,x)] K[ d(x µ,x)] = exp[-d 2 (x µ,x)/ 2 2 µ ] = Gaussian kernel function
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning Training RBF Networks 1 st Stage: Determination of k (=number of basis functions) x µ and µ (kernel parameters) Expectation-Maximization (EM) algorithm 2 nd Stage: Determination of weights w µ Linear Problem
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning 4.6 Remarks on Lazy and Eager Learning Lazy Learning: stores data and postpones decisions until a new query is presented Eager Learning: generalizes beyond the training data before a new query is presented Lazy methods may consider the query instance x when deciding how to generalize beyond the training data D (local approximation) Eager methods cannot (they have already chosen their global approximation to the target function)