Lecture 03: K-nearest neighbors

Slides:

Advertisements

Similar presentations

Principles of Density Estimation

Advertisements

Data Mining Classification: Alternative Techniques

Indian Statistical Institute Kolkata

Instance Based Learning

Lazy vs. Eager Learning Lazy vs. eager learning

Instance based learning K-Nearest Neighbor Locally weighted regression Radial basis functions.

1-NN Rule: Given an unknown sample X decide if for That is, assign X to category if the closest neighbor of X is from category i.

Prénom Nom Document Analysis: Non Parametric Methods for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.

Aprendizagem baseada em instâncias (K vizinhos mais próximos)

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

Nearest Neighbour Condensing and Editing David Claus February 27, 2004 Computer Vision Reading Group Oxford.

INSTANCE-BASE LEARNING

CS Instance Based Learning1 Instance Based Learning.

Nearest-Neighbor Classifiers Sec minutes of math... Definition: a metric function is a function that obeys the following properties: Identity:

Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.

Non-Parameter Estimation 主講人：虞台文. Contents Introduction Parzen Windows k n -Nearest-Neighbor Estimation Classification Techiques – The Nearest-Neighbor.

Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

David Claus and Christoph F. Eick: Nearest Neighbor Editing and Condensing Techniques Nearest Neighbor Editing and Condensing Techniques 1.Nearest Neighbor.

Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 6: Nearest and k-nearest Neighbor Classification.

Chapter1: Introduction Chapter2: Overview of Supervised Learning

KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

Chapter 13 (Prototype Methods and Nearest-Neighbors )

CS Machine Learning Instance Based Learning (Adapted from various sources)

Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.

Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.

KNN & Naïve Bayes Hongning Wang

Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.

Lecture 15. Pattern Classification (I): Statistical Formulation

k-Nearest neighbors and decision tree

Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17

Non-Parameter Estimation

Lecture 04: Logistic Regression

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Lecture 07: Soft-margin SVM

INTRODUCTION TO Machine Learning 3rd Edition

Lecture 02: Linear Regression

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Lecture 05: K-nearest neighbors

Classification Nearest Neighbor

Lecture 09: Gaussian Processes

Lecture 04: Logistic Regression

Instance Based Learning (Adapted from various sources)

K Nearest Neighbor Classification

Predictive Learning from Data

Classification Nearest Neighbor

Nearest-Neighbor Classifiers

CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu.

CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu.

Lecture 07: Soft-margin SVM

Instance Based Learning

CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu.

COSC 4335: Other Classification Techniques

Lecture 08: Soft-margin SVM

Lecture 07: Soft-margin SVM

Lecture 02: Linear Regression

Lecture 04: Logistic Regression

CS480/680: Intro to ML Lecture 01: Perceptron 9/11/18 Yao-Liang Yu.

Advanced Artificial Intelligence Classification

CS480/680: Intro to ML Lecture 09: Kernels 23/02/2019 Yao-Liang Yu.

Generally Discriminant Analysis

Lecture 10: Gaussian Processes

Nearest Neighbors CSC 576: Data Mining.

CSE4334/5334 Data Mining Lecture 7: Classification (4)

Lecture 8. Learning (V): Perceptron Learning

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.

Presentation transcript:

Lecture 03: K-nearest neighbors CS480/680: Intro to ML Lecture 03: K-nearest neighbors 08/04/2019 Yao-Liang Yu

Outline Announcements Algorithm Theory Application 08/04/2019 Yao-Liang Yu

Outline Announcements Algorithm Theory Application 08/04/2019 Yao-Liang Yu

Classification revisited ŷ = sign( xTw + b ) Decision boundary: xTw + b = 0 Parametric: finite-dim w Non-parametric: no specific form (or inf-dim w) 08/04/2019 Yao-Liang Yu

1-Nearest Neighbour Store training set (X, y) For query (test point) x’ find nearest point x in X predict y’ = y(x) 08/04/2019 Yao-Liang Yu

What do you mean “nearest” Need to measure distance or similarity d: X x X  R+ such that symmetry: d(x, x’) = d(x, x’) definite: d(x, x’) = 0 iff x = x’ triangle inequality: d(a, b) <= d(a,c) + d(c,b) Lp distance: dp(x, x’) = || x – x’ ||p p=2: Euclidean distance p=1: Manhattan distance p=inf: Chebyshev distance 08/04/2019 Yao-Liang Yu

Complexity of 1-NN Training: 0… but O(nd) space Testing: O(nd) for each query point n: # of training samples d: # of features Can we do better? 08/04/2019 Yao-Liang Yu

Voronoi diagram In 2D, can construct in O(n logn) time and O(n) space (Fortune, 1989) In 2D, query costs O(log n) Large d? nO(d) space with O(d log n) query time  approximate NN 08/04/2019 Yao-Liang Yu

Normalization Usually, for each feature, subtract the mean and divide by standard deviation Or, equivalently, use a different distance metric 08/04/2019 Yao-Liang Yu

Learning the metric Mahalanobias distance Or equivalently let First perform linear transformation Then use the usual Euclidean distance is small iff such that 08/04/2019 Yao-Liang Yu

k-NN Store training set (X, y) For query (test point) x’ find k nearest points x1, x2, … , xk in X predict y’ = y(x1, x2, … , xk) usually a majority vote among the labels of x1, x2, … , xk say y1=1, y2=1, y3=-1, y4=1, y5=-1  y’= Test complexity: O(nd) 08/04/2019 Yao-Liang Yu

Effect of k 08/04/2019 Yao-Liang Yu

How to select k? 08/04/2019 Yao-Liang Yu

Does k-NN work? MNIST: 60k train, 10k test 08/04/2019 Yao-Liang Yu

Outline Announcements Algorithm Theory Application 08/04/2019 Yao-Liang Yu

Bayes rule Bayes error Bayes rule f*(X) = 1 iff η(X) ≥ ½ 08/04/2019 Yao-Liang Yu

Derivation 08/04/2019 Yao-Liang Yu

Multi-class This is the best we can do even when we know the distribution of (X,Y) How big can P* be? 08/04/2019 Yao-Liang Yu

At most twice worse (Cover & Hart’67) limninf P(Y1(n)≠Y) ≤ 2P* - c/(c-1) (P*)2 Asymptotically! # of classes 1 / (max P*) 1-NN error Bayes error If P* close to 0, then 1-NN error close to 2P* How big does n have to be? 08/04/2019 Yao-Liang Yu

The picture Both assume we have infinite amount of training data! 08/04/2019 Yao-Liang Yu

1-NN vs k-NN error(1NN) = 1/2n error(kNN) for k = 2t+1: 08/04/2019 Yao-Liang Yu

Curse of dimensionality Theorem (SSBD, p224). For any c > 1 and any learning algorithm L, there exists a distribution over [0,1]d x {0,1} such that the Bayes error is 0 but for sample size n ≤ (c+1)d/2 , the error of the rule L is greater than ¼. k-NN is effective when have many training samples Dimensionality reduction may be helpful 08/04/2019 Yao-Liang Yu

Outline Announcements Algorithm Theory Application 08/04/2019 Yao-Liang Yu

Locally linear embedding (Saul & Roweis’00) 08/04/2019 Yao-Liang Yu

k-NN for regression Training: store training set (X, y) Query x’ Find k-nns x1, x2, … , xk in X output y’ = y(x1, x2, … , xk) = (y1+y2+…+yk)/k 08/04/2019 Yao-Liang Yu

Questions? 08/04/2019 Yao-Liang Yu