Aprendizagem baseada em instâncias (K vizinhos mais próximos)
SAD Tagus 2004/05 H. Galhardas The k-Nearest Neighbor Algorithm (1) All instances correspond to points in the n-D space. All instances correspond to points in the n-D space. Given an unknown tuple, the k-NN classifier searches the pattern space for the k training tuples that are closest to the unknown tuple. Given an unknown tuple, the k-NN classifier searches the pattern space for the k training tuples that are closest to the unknown tuple. The nearest neighbor is defined in terms of Euclidean distance: The nearest neighbor is defined in terms of Euclidean distance: The target function could be discrete- or real- valued. The target function could be discrete- or real- valued. For discrete-valued, the k-NN returns the most common value among the k training examples nearest to x q. For discrete-valued, the k-NN returns the most common value among the k training examples nearest to x q.
SAD Tagus 2004/05 H. Galhardas The k-Nearest Neighbor Algorithm (2) Key idea: just store all training examples Key idea: just store all training examples Nearest neighbor: Given query instance x q, first locate nearest training example x n, then estimate f(x q )=f(x n ) Given query instance x q, first locate nearest training example x n, then estimate f(x q )=f(x n ) K-nearest neighbor: Given x q, take vote among its k nearest neighbors (if discrete-valued target function) Given x q, take vote among its k nearest neighbors (if discrete-valued target function) Take mean of f values of k nearest neighbors (if real-valued) f(x q )= i=1 k f(x i )/k Take mean of f values of k nearest neighbors (if real-valued) f(x q )= i=1 k f(x i )/k
SAD Tagus 2004/05 H. Galhardas Lazy vs Eager Learning lazy evaluation Instance-based learning: lazy evaluation eager evaluation Decision-tree and Bayesian classification: eager evaluation Key differences Lazy: may consider query instance xq when deciding how to generalize beyond the training data D Eager: cannot since they have already chosen global approximation when seeing the query Efficiency: Lazy - less time training but more time predicting Efficiency: Lazy - less time training but more time predicting Accuracy Accuracy Lazy: effectively uses a richer hypothesis space since it uses many local linear functions to form its implicit global approximation to the target function Eager: must commit to a single hypothesis that covers the entire instance space
SAD Tagus 2004/05 H. Galhardas When to Consider Nearest Neighbors Instances map to points in R N Instances map to points in R N Less than 20 attributes per instance, typically normalized Less than 20 attributes per instance, typically normalized Lots of training data Lots of training data Advantages: Training is very fast Training is very fast Learn complex target functions Learn complex target functions Do not loose information Do not loose information Disadvantages: Slow at query time Slow at query time Presorting and indexing training samples into search trees reduces time Easily fooled by irrelevant attributes Easily fooled by irrelevant attributes
SAD Tagus 2004/05 H. Galhardas How to determine the good value for K? Determined experimentally Determined experimentally Start with K=1 and use a test set to validate the error rate of the classifier Start with K=1 and use a test set to validate the error rate of the classifier Repeat with K=K+1 Repeat with K=K+1 Choose the value of K for which the error rate is minimum Choose the value of K for which the error rate is minimum
SAD Tagus 2004/05 H. Galhardas Definition of Voronoi diagram the decision surface induced by 1-NN for a typical set of training examples. the decision surface induced by 1-NN for a typical set of training examples.. _ + _ xqxq + _ _ + _ _
SAD Tagus 2004/05 H. Galhardas 1-Nearest Neighbor query point q f nearest neighbor q i
SAD Tagus 2004/05 H. Galhardas 3-Nearest Neighbors query point q f 3 nearest neighbors 2x,1o
SAD Tagus 2004/05 H. Galhardas 7-Nearest Neighbors query point q f 7 nearest neighbors 3x,4o
SAD Tagus 2004/05 H. Galhardas Distance Weighted k-NN Give more weight to neighbors closer to the query point f ^ (x q ) = i=1 k w i f(x i ) / i=1 k w i where w i =K(d(x q,x i )) and d(x q,x i ) is the distance between x q and x i
SAD Tagus 2004/05 H. Galhardas Curse of Dimensionality Imagine instances described by 20 attributes but only 3 are relevant to target function Curse of dimensionality: nearest neighbor is easily misled when instance space is high-dimensional One approach: Stretch j-th axis by weight z j, where z 1,…,z n chosen to minimize prediction error Stretch j-th axis by weight z j, where z 1,…,z n chosen to minimize prediction error Use cross-validation to automatically choose weights z 1,…,z n Use cross-validation to automatically choose weights z 1,…,z n Note setting z j to zero eliminates this dimension alltogether (feature subset selection) Note setting z j to zero eliminates this dimension alltogether (feature subset selection)
SAD Tagus 2004/05 H. Galhardas Bibliografia Data Mining: Concepts and Techniques, J. Han & M. Kamber, Morgan Kaufmann, 2001 (Sect ) Data Mining: Concepts and Techniques, J. Han & M. Kamber, Morgan Kaufmann, 2001 (Sect ) Machine Learning, Tom Mitchell, McGraw 1997 (Cap 8) Machine Learning, Tom Mitchell, McGraw 1997 (Cap 8)