ECE471-571 – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.

Slides:



Advertisements
Similar presentations
Principles of Density Estimation
Advertisements

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Lecture 3 Nonparametric density estimation and classification
Pattern recognition Professor Aly A. Farag
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Prénom Nom Document Analysis: Non Parametric Methods for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Chapter 4 (part 2): Non-Parametric Classification
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
Non-Parameter Estimation 主講人:虞台文. Contents Introduction Parzen Windows k n -Nearest-Neighbor Estimation Classification Techiques – The Nearest-Neighbor.
1 E. Fatemizadeh Statistical Pattern Recognition.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 471/571 – Lecture 2 Bayesian Decision Theory 08/25/15.
ECE 471/571 – Lecture 6 Dimensionality Reduction – Fisher’s Linear Discriminant 09/08/15.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 6: Nearest and k-nearest Neighbor Classification.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
ECE 471/571 - Lecture 19 Review 11/12/15. A Roadmap 2 Pattern Classification Statistical ApproachNon-Statistical Approach SupervisedUnsupervised Basic.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
ECE 471/571 – Lecture 21 Syntactic Pattern Recognition 11/19/15.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
ECE 471/571 – Lecture 3 Discriminant Function and Normal Density 08/27/15.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Syntactic Pattern Recognition 04/28/17
ECE 471/571 - Lecture 19 Review 02/24/17.
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
Non-Parameter Estimation
Performance Evaluation 02/15/17
INTRODUCTION TO Machine Learning 3rd Edition
Ch8: Nonparametric Methods
School of Computer Science & Engineering
3(+1) classifiers from the Bayesian world
ECE 471/571 – Lecture 18 Classifier Fusion 04/12/17.
Unsupervised Learning - Clustering 04/03/17
Unsupervised Learning - Clustering
Outline Parameter estimation – continued Non-parametric methods.
K Nearest Neighbor Classification
Nearest-Neighbor Classifiers
ECE 471/571 – Lecture 12 Perceptron.
ECE 471/571 – Review 1.
Parametric Estimation
Hairong Qi, Gonzalez Family Professor
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 16: NONPARAMETRIC TECHNIQUES
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
ECE – Pattern Recognition Lecture 16: NN – Back Propagation
ECE – Pattern Recognition Lecture 15: NN - Perceptron
ECE – Pattern Recognition Lecture 13 – Decision Tree
Multivariate Methods Berlin Chen
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Multivariate Methods Berlin Chen, 2005 References:
Hairong Qi, Gonzalez Family Professor
Hairong Qi, Gonzalez Family Professor
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
ECE – Lecture 1 Introduction.
Bayesian Decision Theory
Hairong Qi, Gonzalez Family Professor
ECE – Pattern Recognition Lecture 8 – Performance Evaluation
ECE – Pattern Recognition Lecture 4 – Parametric Estimation
Hairong Qi, Gonzalez Family Professor
ECE – Pattern Recognition Lecture 14 – Gradient Descent
ECE – Pattern Recognition Midterm Review
Presentation transcript:

ECE471-571 – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi Email: hqi@utk.edu

Pattern Classification Statistical Approach Non-Statistical Approach Supervised Unsupervised Decision-tree Basic concepts: Baysian decision rule (MPP, LR, Discri.) Basic concepts: Distance Agglomerative method Syntactic approach Parameter estimate (ML, BL) k-means Non-Parametric learning (kNN) Winner-takes-all LDF (Perceptron) Kohonen maps NN (BP) Mean-shift Support Vector Machine Deep Learning (DL) Dimensionality Reduction FLD, PCA Performance Evaluation ROC curve (TP, TN, FN, FP) cross validation Stochastic Methods local opt (GD) global opt (SA, GA) Classifier Fusion majority voting NB, BKS

Review - Bayes Decision Rule Maximum Posterior Probability Likelihood Ratio If , then x belongs to class 1, otherwise, 2. Discriminant Function Case 1: Minimum Euclidean Distance (Linear Machine), Si=s2I Case 2: Minimum Mahalanobis Distance (Linear Machine), Si = S Case 3: Quadratic classifier , Si = arbitrary Estimate Gaussian, Two-modal Gaussian Dimensionality reduction Performance evaluation and ROC curve

Motivation Estimate the density functions without the assumption that the pdf has a particular form

*Problem with Parzen Windows Discontinuous window function -> Gaussian The choice of h Still another one: fixed volume

kNN (k-Nearest Neighbor) To estimate p(x) from n samples, we can center a cell at x and let it grow until it contains kn samples, and kn can be some function of n Normally, we let

kNN in Classification Given c training sets from c classes, the total number of samples is *Given a point x at which we wish to determine the statistics, we find the hypersphere of volume V which just encloses k points from the combined set. If within that volume, km of those points belong to class m, then we estimate the density for class m by

kNN Classification Rule The decision rule tells us to look in a neighborhood of the unknown feature vector for k samples. If within that neighborhood, more samples lie in class i than any other class, we assign the unknown as belonging to class i.

Problems What kind of distance should be used to measure “nearest” Euclidean metric is a reasonable measurement Computation burden Massive storage burden Need to compute the distance from the unknown to all the neighbors

Computational Complexity of kNN In both space (storage space) and time (search time) Algorithms reducing the computational burden Computing partial distances Prestructuring Editing the stored prototypes

Method 1 - Partial Distance Calculate the distance using some subset r of the full d dimensions. If this partial distance is too large, we don’t need to compute further What we know about the subspace is indicative of the full space

Method 2 - Prestructuring Prestructure samples in the training set into a search tree where samples are selectively linked Distances are calculated between the testing sample and a few stored “root” samples, and then consider only the samples linked to it Closest distance no longer guaranteed. X x o o x x o o X x x o o x x x x

Method 3 – Editing/Pruning/Condensing Eliminate “useless” samples during training For example, eliminate samples that are surrounded by training points of the same category label Leave the decision boundaries unchanged

Discussion Combining three methods Other concerns The choice of k The choice of metrics

Distance (Metrics) Used by kNN Properties of metrics Nonnegativity (D(a,b) >=0) Reflexivity (D(a,b) = 0 iff a=b) Symmetry (D(a,b) = D(b,a)) Triangle inequality (D(a,b) + D(b,c) >= D(a,c)) Different metrics Minkowski metric Manhattan distance (city block distance) Euclidean distance When k is inf, maximum of the projected distances onto each of the d coordinate axes

Visualizing the Distances