Data Mining Classification: Alternative Techniques

Slides:



Advertisements
Similar presentations
Data Mining Classification: Alternative Techniques
Advertisements

1 Classification using instance-based learning. 3 March, 2000Advanced Knowledge Management2 Introduction (lazy vs. eager learning) Notion of similarity.
Lecture 3-4: Classification & Clustering
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Lazy vs. Eager Learning Lazy vs. eager learning
Classification and Decision Boundaries
Data Mining Classification: Alternative Techniques
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Data Mining Classification: Alternative Techniques
Lecture outline Classification Naïve Bayes classifier Nearest-neighbor classifier.
CES 514 – Data Mining Lec 9 April 14 Mid-term k nearest neighbor.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
© Tan,Steinbach, Kumar Introduction to Data Mining 1/17/ Data Mining Classification: Alternative Techniques Figures for Chapter 5 Introduction to.
Nearest Neighbour Condensing and Editing David Claus February 27, 2004 Computer Vision Reading Group Oxford.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
K Nearest Neighborhood (KNNs)
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
K nearest neighbor classification Presented by Vipin Kumar University of Minnesota Based on discussion in "Intro to Data Mining" by Tan,
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
Non-Parameter Estimation 主講人:虞台文. Contents Introduction Parzen Windows k n -Nearest-Neighbor Estimation Classification Techiques – The Nearest-Neighbor.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
David Claus and Christoph F. Eick: Nearest Neighbor Editing and Condensing Techniques Nearest Neighbor Editing and Condensing Techniques 1.Nearest Neighbor.
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.
Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 6: Nearest and k-nearest Neighbor Classification.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
CS Machine Learning Instance Based Learning (Adapted from various sources)
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
KNN & Naïve Bayes Hongning Wang
Hierarchical Clustering: Time and Space requirements
k-Nearest neighbors and decision tree
Non-Parameter Estimation
Data Science Algorithms: The Basic Methods
Instance Based Learning
Data Mining Classification: Alternative Techniques
Information Retrieval
Classification Nearest Neighbor
Data Mining Classification: Alternative Techniques
Chapter 7 – K-Nearest-Neighbor
Data Mining Classification: Basic Concepts and Techniques
Lecture Notes for Chapter 9 Introduction to Data Mining, 2nd Edition
Instance Based Learning (Adapted from various sources)
K Nearest Neighbor Classification
Classification Nearest Neighbor
Nearest-Neighbor Classifiers
Instance Based Learning
COSC 4335: Other Classification Techniques
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier
Nearest Neighbor Classifiers
Junheng, Shengming, Yunsheng 11/2/2018
Lecture Notes for Chapter 4 Artificial Neural Networks
Nearest Neighbors CSC 576: Data Mining.
Nearest Neighbor Classifiers
CSE4334/5334 Data Mining Lecture 7: Classification (4)
COSC 4335: Part2: Other Classification Techniques
Data Mining CSCI 307, Spring 2019 Lecture 11
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Presentation transcript:

Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 4 Instance-Based Learning Introduction to Data Mining , 2nd Edition by Tan, Steinbach, Karpatne, Kumar

Instance Based Classifiers Examples: Rote-learner Memorizes entire training data and performs classification only if attributes of record match one of the training examples exactly Nearest neighbor Uses k “closest” points (nearest neighbors) for performing classification

Nearest Neighbor Classifiers Basic idea: If it walks like a duck, quacks like a duck, then it’s probably a duck Training Records Test Record Compute Distance Choose k of the “nearest” records

Nearest-Neighbor Classifiers Requires three things The set of labeled records Distance Metric to compute distance between records The value of k, the number of nearest neighbors to retrieve To classify an unknown record: Compute distance to other training records Identify k nearest neighbors Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote)

Definition of Nearest Neighbor K-nearest neighbors of a record x are data points that have the k smallest distances to x

1 nearest-neighbor Voronoi Diagram

Nearest Neighbor Classification Compute distance between two points: Euclidean distance Determine the class from nearest neighbor list Take the majority vote of class labels among the k-nearest neighbors Weigh the vote according to distance weight factor, w = 1/d2

Nearest Neighbor Classification… Choosing the value of k: If k is too small, sensitive to noise points If k is too large, neighborhood may include points from other classes

Nearest Neighbor Classification… Scaling issues Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes Example: height of a person may vary from 1.5m to 1.8m weight of a person may vary from 90lb to 300lb income of a person may vary from $10K to $1M

Nearest Neighbor Classification… Selection of the right similarity measure is critical: 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 vs 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 Euclidean distance = 1.4142 for both pairs

Nearest neighbor Classification… k-NN classifiers are lazy learners since they do not build models explicitly Classifying unknown records are relatively expensive Can produce arbitrarily shaped decision boundaries Easy to handle variable interactions since the decisions are based on local information Selection of right proximity measure is essential Superfluous or redundant attributes can create problems Missing attributes are hard to handle

Improving KNN Efficiency Avoid having to compute distance to all objects in the training set Multi-dimensional access methods (k-d trees) Fast approximate similarity search Locality Sensitive Hashing (LSH) Condensing Determine a smaller set of objects that give the same performance Editing Remove objects to improve efficiency

KNN and Proximity Graphs a graph in which two vertices are connected by an edge if and only if the vertices satisfy particular geometric requirements nearest neighbor graphs, minimum spanning trees Delaunay triangulations relative neighborhood graphs Gabriel graphs See recent papers by Toussaint G. T. Toussaint. Proximity graphs for nearest neighbor decision rules: recent progress. In Interface-2002, 34th Symposium on Computing and Statistics, ontreal, Canada, April 17–20 2002. G. T. Toussaint. Open problems in geometric methods for instance based learning. In Discrete and Computational Geometry, volume 2866 of Lecture Notes in Computer Science, pages 273–283, December 6-9, 2003. G. T. Toussaint. Geometric proximity graphs for improving nearest neighbor methods in instance-based learning and data mining. Int. J. Comput. Geometry Appl., 15(2):101–150, 2005.