CES 514 – Data Mining Lec 9 April 14 Mid-term k nearest neighbor.

Slides:

Advertisements

Similar presentations

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke

Advertisements

Lecture 3-4: Classification & Clustering

Data Mining Classification: Alternative Techniques

Classification: Alternative Techniques

Data Mining Classification: Alternative Techniques

Data Mining Classification: Alternative Techniques

Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.

Data Mining Classification: Alternative Techniques

Lazy vs. Eager Learning Lazy vs. eager learning

1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning 4.1 Introduction Instance-Based Learning: Local approximation to the.

Classification and Decision Boundaries

Data Mining Classification: Alternative Techniques

Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)

By Fernando Seoane, April 25 th, 2006 Demo for Non-Parametric Classification Euclidean Metric Classifier with Data Clustering.

Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.

1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.

CES 514 – Data Mining Lecture 8 classification (contd…)

© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.

Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.

Comparison of Instance-Based Techniques for Learning to Predict Changes in Stock Prices iCML Conference December 10, 2003 Presented by: David LeRoux.

Data Mining Classification: Alternative Techniques

Lecture outline Classification Naïve Bayes classifier Nearest-neighbor classifier.

1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.

Aprendizagem baseada em instâncias (K vizinhos mais próximos)

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

Instance Based Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán) 1.

INSTANCE-BASE LEARNING

The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.

Nearest Neighbor Classifiers other names: –instance-based learning –case-based learning (CBL) –non-parametric learning –model-free learning.

Module 04: Algorithms Topic 07: Instance-Based Learning

DATA MINING LECTURE 11 Classification Nearest Neighbor Classification Support Vector Machines Logistic Regression Naïve Bayes Classifier Supervised Learning.

K Nearest Neighborhood (KNNs)

DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.

Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.

1 Data Mining Lecture 5: KNN and Bayes Classifiers.

Basic Data Mining Technique

K nearest neighbor classification Presented by Vipin Kumar University of Minnesota Based on discussion in "Intro to Data Mining" by Tan,

Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.

 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 9 Instance-Based.

Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.

Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.

Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.

Nearest Neighbor Ling 572 Advanced Statistical Methods in NLP January 12, 2012.

CpSc 881: Machine Learning Instance Based Learning.

Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.

Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.

DATA MINING LECTURE 10b Classification k-nearest neighbor classifier

Overview Data Mining - classification and clustering

CS Machine Learning Instance Based Learning (Adapted from various sources)

K-Nearest Neighbor Learning.

Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.

Fall 2004, CIS, Temple University CIS527: Data Warehousing, Filtering, and Mining Lecture 8 Alternative Classification Algorithms Lecture slides taken/modified.

Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.

Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.

Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Minqi Zhou © Tan,Steinbach, Kumar Introduction.

Data Mining Classification: Alternative Techniques

Classification Nearest Neighbor

Instance Based Learning (Adapted from various sources)

K Nearest Neighbor Classification

Classification Nearest Neighbor

Nearest-Neighbor Classifiers

Data Mining extracting knowledge from a large amount of data

Instance Based Learning

COSC 4335: Other Classification Techniques

DATA MINING LECTURE 10 Classification k-nearest neighbor classifier

Nearest Neighbors CSC 576: Data Mining.

Data Mining Classification: Alternative Techniques

Nearest Neighbor Classifiers

CSE4334/5334 Data Mining Lecture 7: Classification (4)

Presentation transcript:

CES 514 – Data Mining Lec 9 April 14 Mid-term k nearest neighbor

Instance-Based Classifiers Store the training records Use training records to predict the class label of unseen cases

Nearest-Neighbor Classifiers l Requires three things –The set of stored records –Distance Metric to compute distance between records –The value of k, the number of nearest neighbors to retrieve l To classify an unknown record: –Compute distance to other training records –Identify k nearest neighbors –Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority / plurality vote)

Definition of Nearest Neighbor k-nearest neighbors of a record x are data points that have the k smallest distance to x

1 nearest-neighbor Voronoi Diagram

Voronoi Diagram applet

Nearest Neighbor Classification l Compute distance between two points: –Euclidean distance l Determine the class from nearest neighbor list –take the majority vote of class labels among the k-nearest neighbors –Weigh the vote according to distance  weight factor, w = 1/d 2

Nearest Neighbor Classification l Choosing the value of k: –If k is too small, sensitive to noise points –If k is too large, neighborhood may include points from other classes

Nearest Neighbor Classification l Scaling issues –Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes –Example:  height of a person may vary from 1.5m to 1.8m  weight of a person may vary from 90lb to 300lb  income of a person may vary from $10K to $1M

Nearest neighbor Classification l k-NN classifiers are lazy learners –It does not build models explicitly –Unlike eager learners such as decision tree induction and rule- based systems –Classifying unknown records are relatively expensive

Example: PEBLS l PEBLS: Parallel Examplar-Based Learning System (Cost & Salzberg) –Works with both continuous and nominal features  For nominal features, distance between two nominal values is computed using modified value difference metric (MVDM) –Each record is assigned a weight factor –Number of nearest neighbor, k = 1

Example: PEBLS Class Marital Status Single MarriedDivorced Yes201 No241 Distance between nominal attribute values: d(Single,Married) = | 2/4 – 0/4 | + | 2/4 – 4/4 | = 1 d(Single,Divorced) = | 2/4 – 1/2 | + | 2/4 – 1/2 | = 0 d(Married,Divorced) = | 0/4 – 1/2 | + | 4/4 – 1/2 | = 1 d(Refund=Yes,Refund=No) = | 0/3 – 3/7 | + | 3/3 – 4/7 | = 6/7 Class Refund YesNo Yes03 No34

Example: PEBLS Distance between record X and record Y: where: w X  1 if X makes accurate prediction most of the time w X > 1 if X is not reliable for making predictions