K Nearest Neighbor Classification

Slides:



Advertisements
Similar presentations
Principles of Density Estimation
Advertisements

1 Classification using instance-based learning. 3 March, 2000Advanced Knowledge Management2 Introduction (lazy vs. eager learning) Notion of similarity.
DECISION TREES. Decision trees  One possible representation for hypotheses.
Lecture 3-4: Classification & Clustering
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
K-means method for Signal Compression: Vector Quantization
Instance Based Learning
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Lazy vs. Eager Learning Lazy vs. eager learning
Classification and Decision Boundaries
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Data Mining Classification: Alternative Techniques
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
CES 514 – Data Mining Lec 9 April 14 Mid-term k nearest neighbor.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Nearest Neighbour Condensing and Editing David Claus February 27, 2004 Computer Vision Reading Group Oxford.
CS Instance Based Learning1 Instance Based Learning.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
K Nearest Neighborhood (KNNs)
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Non-Parameter Estimation 主講人:虞台文. Contents Introduction Parzen Windows k n -Nearest-Neighbor Estimation Classification Techiques – The Nearest-Neighbor.
David Claus and Christoph F. Eick: Nearest Neighbor Editing and Condensing Techniques Nearest Neighbor Editing and Condensing Techniques 1.Nearest Neighbor.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 6: Nearest and k-nearest Neighbor Classification.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
CS Machine Learning Instance Based Learning (Adapted from various sources)
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
KNN & Naïve Bayes Hongning Wang
Data Transformation: Normalization
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
Non-Parameter Estimation
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Instance Based Learning
INTRODUCTION TO Machine Learning 3rd Edition
Ch8: Nonparametric Methods
Lecture 05: K-nearest neighbors
Classification Nearest Neighbor
Instance Based Learning (Adapted from various sources)
Classification Nearest Neighbor
Nearest-Neighbor Classifiers
Locality Sensitive Hashing
Instance Based Learning
Classification Algorithms
COSC 4335: Other Classification Techniques
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier
Chap 8. Instance Based Learning
Nearest Neighbors CSC 576: Data Mining.
Lecture 03: K-nearest neighbors
Data Mining Classification: Alternative Techniques
Nearest Neighbor Classifiers
Group 9 – Data Mining: Data
CSE4334/5334 Data Mining Lecture 7: Classification (4)
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.
Presentation transcript:

K Nearest Neighbor Classification

Bayes Classifier: Recap P( TUNA | L) P( SHARK | L) P( HILSA | L) L Maximum Aposteriori (MAP) Rule Distributions assumed to be of particular family (e.g., Gaussian), and parameters estimated from training data.

Bayes Classifier: Recap P( TUNA | L) P( SHARK | L) P( HILSA | L) L +-  Approximate Maximum Aposteriori (MAP) Rule Non-parametric (data driven) approach: consider a small window around L, Find which class is most populous in that window.

Nearest Neighbor Classifiers Basic idea: If it walks like a duck, quacks like a duck, then it’s probably a duck Training Records Test Record Compute Distance Choose k of the “nearest” records

Basic Idea k-NN classification rule is to assign to a test sample the majority category label of its k nearest training samples In practice, k is usually chosen to be odd, so as to avoid ties The k = 1 rule is generally called the nearest-neighbor classification rule

Definition of Nearest Neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x

Voronoi Diagram Properties: All possible points within a sample's Voronoi cell are the nearest neighboring points for that sample For any sample, the nearest sample is determined by the closest Voronoi cell edge

Distance-weighted k-NN Replace by: General Kernel functions like Parzen Windows may be considered Instead of inverse distance.

Predicting Continuous Values Replace by: Note: unweighted corresponds to wi=1 for all i

Nearest-Neighbor Classifiers: Issues The value of k, the number of nearest neighbors to retrieve Choice of Distance Metric to compute distance between records Computational complexity Size of training set Dimension of data

Value of K Choosing the value of k: If k is too small, sensitive to noise points If k is too large, neighborhood may include points from other classes Rule of thumb: K = sqrt(N) N: number of training points

Distance Metrics Minkowsky = l-norm

Distance Measure: Scale Effects Different features may have different measurement scales E.g., patient weight in kg (range [50,200]) vs. blood protein values in ng/dL (range [-3,3]) Consequences Patient weight will have a much greater influence on the distance between samples May bias the performance of the classifier

Standardization Transform raw feature values into z-scores is the value for the ith sample and jth feature is the average of all for feature j is the standard deviation of all over all input samples Range and scale of z-scores should be similar (providing distributions of raw feature values are alike)

Nearest Neighbor : Dimensionality Problem with Euclidean measure: High dimensional data curse of dimensionality Can produce counter-intuitive results Shrinking density – sparsification effect 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 vs 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 d = 1.4142 d = 1.4142

Distance for Nominal Attributes Do a single example here. Shape (Round, Square) 6/10 and 3/5

Distance for Heterogeneous Data Wilson, D. R. and Martinez, T. R., Improved Heterogeneous Distance Functions, Journal of Artificial Intelligence Research, vol. 6, no. 1, pp. 1-34, 1997

Nearest Neighbour : Computational Complexity Expensive To determine the nearest neighbour of a query point q, must compute the distance to all N training examples Pre-sort training examples into fast data structures (kd-trees) Compute only an approximate distance (LSH) Remove redundant data (condensing) Storage Requirements Must store all training data P Pre-sorting often increases the storage requirements High Dimensional Data “Curse of Dimensionality” Required amount of training data increases exponentially with dimension Computational cost also increases dramatically Partitioning techniques degrade to linear search in high dimension

Reduction in Computational Complexity Reduce size of training set Condensation, editing Use geometric data structure for high dimensional search

Condensation: Decision Regions Each cell contains one sample, and every location within the cell is closer to that sample than to any other sample. A Voronoi diagram divides the space into such cells. Every query point will be assigned the classification of the sample within that cell. The decision boundary separates the class regions based on the 1-NN decision rule. Knowledge of this boundary is sufficient to classify new points. The boundary itself is rarely computed; many algorithms seek to retain only those points necessary to generate an identical boundary.

Condensing Aim is to reduce the number of training samples Retain only the samples that are needed to define the decision boundary Decision Boundary Consistent – a subset whose nearest neighbour decision boundary is identical to the boundary of the entire training set Minimum Consistent Set – the smallest subset of the training data that correctly classifies all of the original training data Original data Condensed data Minimum Consistent Set

Condensing Condensed Nearest Neighbour (CNN) Incremental Order dependent Neither minimal nor decision boundary consistent O(n3) for brute-force method Initialize subset with a single (or K) training example Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset Return to 2 until no transfers occurred or the subset is full

Condensing Condensed Nearest Neighbour (CNN) Initialize subset with a single training example Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset Return to 2 until no transfers occurred or the subset is full

Condensing Condensed Nearest Neighbour (CNN) Initialize subset with a single training example Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset Return to 2 until no transfers occurred or the subset is full

Condensing Condensed Nearest Neighbour (CNN) Initialize subset with a single training example Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset Return to 2 until no transfers occurred or the subset is full

Condensing Condensed Nearest Neighbour (CNN) Initialize subset with a single training example Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset Return to 2 until no transfers occurred or the subset is full

Condensing Condensed Nearest Neighbour (CNN) Initialize subset with a single training example Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset Return to 2 until no transfers occurred or the subset is full

Condensing Condensed Nearest Neighbour (CNN) Initialize subset with a single training example Classify all remaining samples using the subset, and transfer any incorrectly classified samples to the subset Return to 2 until no transfers occurred or the subset is full

High dimensional search Given a point set and a nearest neighbor query point Find the points enclosed in a rectangle (range) around the query Perform linear search for nearest neighbor only in the rectangle Query

kd-tree: data structure for range search Index data into a tree Search on the tree Tree construction: At each level we use a different dimension to split x=5 x>=5 x<5 C y=6 B y=3 E x=6 A D

kd-tree example X=5 X=3 X=7 y=6 y=5 Y=6 x=8 x=7 x=3 y=2 Y=2 X=5 X=8

KNN: Alternate Terminologies Instance Based Learning Lazy Learning Case Based Reasoning Exemplar Based Learning