Chapter 7 – K-Nearest-Neighbor

Slides:



Advertisements
Similar presentations
1 Classification using instance-based learning. 3 March, 2000Advanced Knowledge Management2 Introduction (lazy vs. eager learning) Notion of similarity.
Advertisements

Chapter 3 – Data Exploration and Dimension Reduction © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Data Mining Classification: Alternative Techniques
Chapter 5 – Evaluating Classification & Predictive Performance
Chapter 4 – Evaluating Classification & Predictive Performance © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel.
Chapter 8 – Logistic Regression
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Data Mining Classification: Alternative Techniques
Chapter 7 – K-Nearest-Neighbor
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Nearest Neighbor Classifiers other names: –instance-based learning –case-based learning (CBL) –non-parametric learning –model-free learning.
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
1 Lazy Learning – Nearest Neighbor Lantz Ch 3 Wk 2, Part 1.
Evaluation – next steps
Performance measurement. Must be careful what performance metric we use For example, say we have a NN classifier with 1 output unit, and we code ‘1 =
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
Chapter 9 – Classification and Regression Trees
Chapter 12 – Discriminant Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz.
CSSE463: Image Recognition Day 11 Lab 4 (shape) tomorrow: feel free to start in advance Lab 4 (shape) tomorrow: feel free to start in advance Test Monday.
Practical Issues of Classification Underfitting and Overfitting –Training errors –Generalization (test) errors Missing Values Costs of Classification.
Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
CSSE463: Image Recognition Day 11 Due: Due: Written assignment 1 tomorrow, 4:00 pm Written assignment 1 tomorrow, 4:00 pm Start thinking about term project.
Evaluating Classification Performance
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Chapter 4 –Dimension Reduction Data Mining for Business Analytics Shmueli, Patel & Bruce.
Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.
Chapter 8 – Naïve Bayes DM for Business Intelligence.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Chapter 14 – Association Rules and Collaborative Filtering © Galit Shmueli and Peter Bruce 2016 Data Mining for Business Analytics (3rd ed.) Shmueli, Bruce.
Chapter 12 – Discriminant Analysis
Evaluation – next steps
Data Mining – Algorithms: Instance-Based Learning
Performance Evaluation 02/15/17
Classifiers!!! BCH339N Systems Biology / Bioinformatics – Spring 2016
Chapter 13 – Ensembles and Uplift
CSSE463: Image Recognition Day 11
Ch8: Nonparametric Methods
Classification Nearest Neighbor
Performance Measures II
Data Mining Classification: Alternative Techniques
K Nearest Neighbor Classification
SEG 4630 E-Commerce Data Mining — Final Review —
Classification Nearest Neighbor
CSSE463: Image Recognition Day 11
Big Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn Some overheads from Galit Shmueli and Peter Bruce 2010.
COSC 4335: Other Classification Techniques
Model Evaluation and Selection
Evaluating Models Part 1
Nearest Neighbors CSC 576: Data Mining.
Data Mining Classification: Alternative Techniques
Multivariate Methods Berlin Chen
CSE4334/5334 Data Mining Lecture 7: Classification (4)
Statistical Models and Machine Learning Algorithms --Review
Multivariate Methods Berlin Chen, 2005 References:
CSSE463: Image Recognition Day 11
CSSE463: Image Recognition Day 11
Data Mining CSCI 307, Spring 2019 Lecture 21
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Hairong Qi, Gonzalez Family Professor
ECE – Pattern Recognition Lecture 8 – Performance Evaluation
Machine Learning in Business John C. Hull
Chapter 4 –Dimension Reduction
Presentation transcript:

Chapter 7 – K-Nearest-Neighbor Data Mining for Business Analytics Shmueli, Patel & Bruce

Characteristics Data-driven, not model-driven Makes no assumptions about the data

Basic Idea For a given record to be classified, identify nearby records “Near” means records with similar predictor values X1, X2, … Xp Classify the record as whatever the predominant class is among the nearby records (the “neighbors”)

How to measure “nearby”? The most popular distance measure is Euclidean distance

Choosing k K is the number of nearby neighbors to be used to classify the new record K=1 means use the single nearest record K=5 means use the 5 nearest records Typically choose that value of k which has lowest error rate in validation data

Low k vs. High k Low values of k (1, 3, …) capture local structure in data (but also noise) High values of k provide more smoothing, less noise, but may miss local structure Note: the extreme case of k = n (i.e., the entire data set) is the same as the “naïve rule” (classify all records according to majority class)

Example: Riding Mowers Data: 24 households classified as owning or not owning riding mowers Predictors: Income, Lot Size

XLMiner Output For each record in validation data (6 records) XLMiner finds neighbors amongst training data (18 records). The record is scored for k=1, k=2, … k=18. Best k appears to be k=8. k = 9, k = 10, k=14 also share low error rate, but best to choose lowest k.

XLMiner Output Confusion Matrix Predicted Class Actual Class 1 TP FN   Predicted Class Actual Class 1 TP FN FP TN Precision = TP/(TP + FP) How precise is your prediction of positives? Sensitivity = TP/(TP + FN) How sensitive is your model to predict actual positive as positive? True Positive Rate (TPR) – the proportion of actual positive among predicted positives Specificity = TN/(FP + TN) How sensitive (specific) is your model to predict actual negatives as negatives? True Negative Rate (TNR) – the proportion of actual negatives among predicted negatives

XLMiner Output Confusion Matrix Predicted Class Actual Class 1 TP FN   Predicted Class Actual Class 1 TP FN FP TN Precision = TP/(TP + FP) Sensitivity = TP/(TP + FN) F1-Score = 2 x 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑥 𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =𝑎 𝑣𝑎𝑙𝑢𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 0 𝑎𝑛𝑑 1 = 1 means perfect prediction

SAS Output Sort the validation records in descending order of probability the record belongs to the class of interest Lift = TP Predicted by the Model TP Predicted by naïve method Gain = Cumulative TP Predicted by the Model Cumulative TP Predicted by naïve method

Using K-NN for Prediction (for Numerical Outcome) Instead of “majority vote determines class” use average of response values May be a weighted average, weight decreasing with distance Differ error metric – RMSE, MAPE, etc.

Advantages Simple No assumptions required about Normal distribution, etc. Effective at capturing complex interactions among variables without having to define a statistical model

Shortcomings Required size of training set increases exponentially with # of predictors, p This is because expected distance to nearest neighbor increases with p (with large vector of predictors, all records end up “far away” from each other) In a large training set, it takes a long time to find distances to all the neighbors and then identify the nearest one(s) These constitute “curse of dimensionality”

Dealing with the Curse Reduce dimension of predictors (e.g., with PCA) Computational shortcuts that settle for “almost nearest neighbors”

Summary Find distance between record-to-be-classified and all other records Select k-nearest records Classify it according to majority vote of nearest neighbors Or, for prediction, take the as average of the nearest neighbors “Curse of dimensionality” – need to limit # of predictors