METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 6: Nearest and k-nearest Neighbor Classification.

Slides:



Advertisements
Similar presentations
Principles of Density Estimation
Advertisements

DECISION TREES. Decision trees  One possible representation for hypotheses.
INC 551 Artificial Intelligence Lecture 11 Machine Learning (Continue)
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
Data Mining Classification: Alternative Techniques
1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
Indian Statistical Institute Kolkata
Lecture 3 Nonparametric density estimation and classification
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Chapter 5: Linear Discriminant Functions
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Chapter 2: Pattern Recognition
1-NN Rule: Given an unknown sample X decide if for That is, assign X to category if the closest neighbor of X is from category i.
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
Prénom Nom Document Analysis: Non Parametric Methods for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Non Parametric Classifiers Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 4 (part 2): Non-Parametric Classification
Nearest Neighbour Condensing and Editing David Claus February 27, 2004 Computer Vision Reading Group Oxford.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
Non-Parameter Estimation 主講人:虞台文. Contents Introduction Parzen Windows k n -Nearest-Neighbor Estimation Classification Techiques – The Nearest-Neighbor.
David Claus and Christoph F. Eick: Nearest Neighbor Editing and Condensing Techniques Nearest Neighbor Editing and Condensing Techniques 1.Nearest Neighbor.
Computational Intelligence: Methods and Applications Lecture 32 Nearest neighbor methods Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 7: Linear and Generalized Discriminant Functions.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
Overview Data Mining - classification and clustering
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Mete Ozay, Fatos T. Yarman Vural —Presented by Tianxiao Jiang
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
KNN & Naïve Bayes Hongning Wang
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
k-Nearest neighbors and decision tree
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
Non-Parameter Estimation
Instance Based Learning
Ch8: Nonparametric Methods
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Lecture 05: K-nearest neighbors
Classification Nearest Neighbor
Instance Based Learning (Adapted from various sources)
K Nearest Neighbor Classification
The point is class B via 3NNC.
Classification Nearest Neighbor
Nearest-Neighbor Classifiers
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Instance Based Learning
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 16: NONPARAMETRIC TECHNIQUES
Nearest Neighbors CSC 576: Data Mining.
Lecture 03: K-nearest neighbors
Data Mining Classification: Alternative Techniques
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.
Presentation transcript:

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 6: Nearest and k-nearest Neighbor Classification

Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric classification rules: Linear and generalized discriminant functions Nearest Neighbor & k-NN rules NN Rule 1-NN: A direct classification using learning samples Assume we have M learning samples from all categories Xjk: k’th sample from i’th category Assume a distance measure between samples

A general distance metric should obey the following rules: Most standard: Euclidian Distance

if For That is, assign X to category if the closest neighbor of X is from category i. 1-NN Rule : Given an unknown sample X

Example: Find the decision boundary for the problem below. Decision Boundary: Piecewise linear k-NN rule : instead of looking at the closest sample, we look at k nearest neighbors to X and we take a vote. The largest vote wins. k is usually taken as an odd number so that no ties occur.

k-NN rule is shown to approach optimal classification when k becomes very large but k-I NN ( NN with a reject option) Decide if majority is higher than a given threshold I. Otherwise reject. k=5 If we threshold I=4 then, For above example is rejected to be classified.

Analysis of NN rule is possible when and it was shown that it is no worse than twice of the minimum-error classification (in error rate). EDITING AND CONDENSING NN rule becomes very attractive because of its simplicity and yet good performance. So, it becomes important to reduce the computational costs involved. Do an intelligent elimination of the samples. Remove samples that do not contribute to the decision boundary.

Voronoi Diagrams Decision boundary is a polygon such that any point that falls in is closer to than any other sample.

So the editing rule consists of throwing away all samples that do not have a Voronoi polygon that has a common boundary belonging to a sample from other category. NN Editing Algorithm  Consider the Voronoi diagrams for all samples  Find Voronoi neighbors of sample X’  If any neighbor is from other category, keep X’. Else remove from the set.  Construct the Voronoi diagram with the remaining samples and use it for classification.

Advantage of NNR: No learning – no estimation so easy to implement Disadvantage: Classification is more expensive. So people found ways to reduce the cost of NNR. Analysis of NN and k-NN rules: Possible when When, X(unknown sample) and X’(nearest neighbor) will get very close. Then, that is, we are selecting the category with probability (a-posteriori probability).

Error Bounds and Relation with Bayes Rule: Assume - Error bound for Bayes Rule(a number between 0 and 1) - Error bound for 1-NN - Error bound for k-NN It can be shown that for 2 categories and for c categories Always better than twice the Bayes error rate!

E = Bound for k=1 Highest error occurs when (all densities are the same) then, E

Distance Measures(Metrices) Non-negativity Reflexivity when only x=y Symmetry Triangle inequality Euclidian distance satisfies these, but not always a meaningful measure. Consider 2 features with different units scaling problem. Scale of changed to half. Solution: Scaling: Normalize the data by re-scaling (Reduce the range to 0-1

Minkowski Metric A general definition for distance measures norm – City block (Manhattan) distance: useful in digital problems norm – Euclidian So use different k’s depending of your problem. Y b a X

Computational Complexity Consider n samples in d dimensions in crude 1-nn rule, we measure the distance from X to all samples. for classification. (for Bayes, ) (n=number of samples:d=dim.)) To reduce the costs, several approaches -Partial Distance: Compare the partially calculated distance to already found closest sample. -Search Tree approach -Editing(condensing) already discussed. All neighbors black All neighbors red Voronoi neighborhoods