Data Mining Classification: Alternative Techniques

Slides:



Advertisements
Similar presentations
Lecture 3-4: Classification & Clustering
Advertisements

K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Indian Statistical Institute Kolkata
Lazy vs. Eager Learning Lazy vs. eager learning
Chapter 6. Classification and Prediction
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning 4.1 Introduction Instance-Based Learning: Local approximation to the.
Classification and Decision Boundaries
Data Mining Classification: Naïve Bayes Classifier
Data Mining Classification: Alternative Techniques
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Cluster Analysis: Advanced Concepts and Algorithms Figures for Chapter 9 Introduction.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Cluster Analysis: Basic Concepts and Algorithms Figures for Chapter 8 Introduction.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Anomaly Detection Figures for Chapter 10 Introduction to Data Mining by Tan,
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Lecture outline Classification Naïve Bayes classifier Nearest-neighbor classifier.
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
CES 514 – Data Mining Lec 9 April 14 Mid-term k nearest neighbor.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Classification: Alternative Techniques Figures for Chapter 5 Introduction to.
K Nearest Neighbor Classification Methods Qiang Yang.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining: Exploring Data Figures for Chapter 3 Introduction to Data Mining by Tan, Steinbach,
INSTANCE-BASE LEARNING
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
K Nearest Neighbor Classification Methods Qiang Yang.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Association Analysis: Advanced Concepts Figures for Chapter 7 Introduction to.
Distance and Similarity Measures
Nearest Neighbor Learning
DATA MINING LECTURE 11 Classification Nearest Neighbor Classification Support Vector Machines Logistic Regression Naïve Bayes Classifier Supervised Learning.
K Nearest Neighborhood (KNNs)
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
10/5/2015Data Mining: Concepts and Techniques1 Chapter 6. Classification and Prediction What is classification? What is prediction? Issues regarding classification.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
K nearest neighbor classification Presented by Vipin Kumar University of Minnesota Based on discussion in "Intro to Data Mining" by Tan,
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
K Nearest Neighbor Classification Methods. Training Set.
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Machine Learning: Decision Trees Homework 4 assigned courtesy: Geoffrey Hinton, Yann LeCun, Tan, Steinbach, Kumar.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
CS Machine Learning Instance Based Learning (Adapted from various sources)
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Fall 2004, CIS, Temple University CIS527: Data Warehousing, Filtering, and Mining Lecture 8 Alternative Classification Algorithms Lecture slides taken/modified.
KNN & Naïve Bayes Hongning Wang
Data and its Distribution. The popular table  Table (relation)  propositional, attribute-value  Example  record, row, instance, case  Table represents.
k-Nearest neighbors and decision tree
Data Mining Classification: Alternative Techniques
Classification Nearest Neighbor
Instance Based Learning (Adapted from various sources)
K Nearest Neighbor Classification
Classification Nearest Neighbor
Nearest-Neighbor Classifiers
Instance Based Learning
COSC 4335: Other Classification Techniques
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier
Nearest Neighbor Classifiers
Nearest Neighbors CSC 576: Data Mining.
Data Mining Classification: Alternative Techniques
Nearest Neighbor Classifiers
CSE4334/5334 Data Mining Lecture 7: Classification (4)
Presentation transcript:

Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

Instance-Based Classifiers Store the training records Use training records to predict the class label of unseen cases

Instance Based Classifiers Examples: Rote-learner Memorizes entire training data and performs classification only if attributes of record match one of the training examples exactly Nearest neighbor Uses k “closest” points (nearest neighbors) for performing classification

Nearest Neighbor Classifiers Basic idea: If it walks like a duck, quacks like a duck, then it’s probably a duck Training Records Test Record Compute Distance Choose k of the “nearest” records

Nearest-Neighbor Classifiers Requires three things The set of stored records Distance Metric to compute distance between records The value of k, the number of nearest neighbors to retrieve To classify an unknown record: Compute distance to other training records Identify k nearest neighbors Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote)

Definition of Nearest Neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x

1 nearest-neighbor Voronoi Diagram

Nearest Neighbor Classification Compute distance between two points: Euclidean distance Determine the class from nearest neighbor list take the majority vote of class labels among the k-nearest neighbors Weigh the vote according to distance weight factor, w = 1/d2

Try KNN on the weather data Outlook Temperature Humidity Windy Play sunny hot high FALSE no TRUE overcast yes rainy mild cool normal ? testing

Nearest Neighbor Classification… Choosing the value of k: If k is too small, sensitive to noise points If k is too large, neighborhood may include points from other classes

Nearest Neighbor Classification… Scaling issues Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes Example of an attribute dominating distance computation: height of a person may vary from 1.5m to 1.8m weight of a person may vary from 90lb to 300lb income of a person may vary from $10K to $1M

Nearest neighbor Classification… k-NN classifiers are lazy learners It does not build models explicitly Unlike eager learners such as decision tree induction and rule-based systems Classifying unknown records are relatively expensive For each test, need to scan all training data

The K-Nearest Neighbor Method K neighbors in training data to the input data x: break ties arbitrarily All k neighbors will vote: majority wins Extension: Weighted KNN “K” is a variable: Often we experiment with different values of K=1, 3, 5, to find out the optimal one Why is KNN important? Often a baseline Must beat this one to claim innovation Applications of KNN Document similarity