K nearest neighbor classification Presented by Vipin Kumar University of Minnesota Based on discussion in "Intro to Data Mining" by Tan,

Slides:



Advertisements
Similar presentations
1 Classification using instance-based learning. 3 March, 2000Advanced Knowledge Management2 Introduction (lazy vs. eager learning) Notion of similarity.
Advertisements

Lecture 3-4: Classification & Clustering
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
Data Mining Classification: Alternative Techniques
NEAREST NEIGHBOR CLASSIFICATION PRESENTED BY Sam Brown
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Indian Statistical Institute Kolkata
Lazy vs. Eager Learning Lazy vs. eager learning
Classification and Decision Boundaries
Data Mining Classification: Alternative Techniques
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.
Instance Based Learning
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Machine Learning Group University College Dublin Nearest Neighbour Classifiers Lazy v’s Eager k-NN Condensed NN.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
Comparison of Instance-Based Techniques for Learning to Predict Changes in Stock Prices iCML Conference December 10, 2003 Presented by: David LeRoux.
Data Mining Classification: Alternative Techniques
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
CES 514 – Data Mining Lec 9 April 14 Mid-term k nearest neighbor.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Instance Based Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán) 1.
INSTANCE-BASE LEARNING
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
CS Instance Based Learning1 Instance Based Learning.
1 DISCRIMINANT ADAPTIVE NEAREST NEIGHBOR CLASSIFICATION PRESENTED BY Carlos Garcia 1.
K Nearest Neighborhood (KNNs)
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Basic Data Mining Technique
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
Non-Parameter Estimation 主講人:虞台文. Contents Introduction Parzen Windows k n -Nearest-Neighbor Estimation Classification Techiques – The Nearest-Neighbor.
N EAREST N EIGHBOR C LASSIFICATION P RESENTED BY J ESSE F LEMING JESSE. UVM. EDU CS D ATA M INING U NIVERSITY OF V ERMONT JESSE. FLEMING.
1 DISCRIMINANT ADAPTIVE NEAREST NEIGHBOR CLASSIFICATION PRESENTED BY Scott Connor DATA MINING – Xindong Wu (Course Instructor) UNIVERSITY.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 9 Instance-Based.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
A Bootstrap Interval Estimator for Bayes’ Classification Error Chad M. Hawes a,b, Carey E. Priebe a a The Johns Hopkins University, Dept of Applied Mathematics.
Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 5 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 6: Nearest and k-nearest Neighbor Classification.
CpSc 810: Machine Learning Instance Based Learning.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
CS Machine Learning Instance Based Learning (Adapted from various sources)
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Kansas State University Department of Computing and Information Sciences CIS 890: Special Topics in Intelligent Systems Wednesday, November 15, 2000 Cecil.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
k-Nearest neighbors and decision tree
Non-Parameter Estimation
Efficient Image Classification on Vertically Decomposed Data
Classification Nearest Neighbor
Data Mining: Concepts and Techniques (3rd ed
Instance Based Learning (Adapted from various sources)
Efficient Image Classification on Vertically Decomposed Data
K Nearest Neighbor Classification
Classification Nearest Neighbor
Nearest-Neighbor Classifiers
Instance Based Learning
COSC 4335: Other Classification Techniques
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier
Nearest Neighbors CSC 576: Data Mining.
Data Mining Classification: Alternative Techniques
CSE4334/5334 Data Mining Lecture 7: Classification (4)
Presentation transcript:

K nearest neighbor classification Presented by Vipin Kumar University of Minnesota Based on discussion in "Intro to Data Mining" by Tan, Steinbach, Kumar

ICDM: Top Ten Data Mining Algorithms K nearest neighbor classification December, Nearest-Neighbor Classifiers l Requires three things –The set of stored records –Distance metric to compute distance between records –The value of k, the number of nearest neighbors to retrieve l To classify an unknown record: –Compute distance to other training records –Identify k nearest neighbors –Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote)

ICDM: Top Ten Data Mining Algorithms K nearest neighbor classification December, Nearest Neighbor Classification l Compute distance between two points: –Example: Euclidean distance –Other distance measures are often better l Determine the class from nearest neighbor list –take the majority vote of class labels among the k-nearest neighbors –Weigh the vote according to distance  weight factor, w = 1/d 2

ICDM: Top Ten Data Mining Algorithms K nearest neighbor classification December, Nearest Neighbor Classification… l Choosing the value of k: –If k is too small, sensitive to noise points –If k is too large, neighborhood may include points from other classes

ICDM: Top Ten Data Mining Algorithms K nearest neighbor classification December, Nearest Neighbor Classification… l Scaling issues –Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes –Example:  height of a person may vary from 1.5m to 1.8m  weight of a person may vary from 90lb to 300lb  income of a person may vary from $10K to $1M

ICDM: Top Ten Data Mining Algorithms K nearest neighbor classification December, Nearest neighbor Classification… l k-NN classifiers are lazy learners –Models are not build explicitly unlike eager learners (e.g., decision trees, SVM, etc) –Building model is cheap –Classifying unknown records are relatively expensive  Requires computation of k-nearest neighbors

ICDM: Top Ten Data Mining Algorithms K nearest neighbor classification December, Impact of K Nearest Neighbor Classification l Simple technique that is easily implemented l Well suited for –multi-modal classes –Records with multiple class labels l Can sometimes be the best method –Michihiro Kuramochi and George Karypis, Gene Classification using Expression Profiles: A Feasibility Study, International Journal on Artificial Intelligence Tools. Vol. 14, No. 4, pp , 2005 –K nearest neighbor outperformed SVM for protein function prediction using expression profiles

ICDM: Top Ten Data Mining Algorithms K nearest neighbor classification December, l Hastie, T. and Tibshirani, R Discriminant Adaptive Nearest Neighbor Classification. IEEE Trans. Pattern Anal. Mach. Intell. 18, 6 (Jun. 1996), DOI= l D. Wettschereck, D. Aha, and T. Mohri. A review and empirical evaluation of featureweighting methods for a class of lazy learning algorithms. Artificial Intelligence Review, 11:273–314, l B. V. Dasarathy. Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press, l Godfried T. Toussaint: Open Problems in Geometric Methods for Instance-Based Learning. JCDCG 2002: l Godfried T. Toussaint, "Proximity graphs for nearest neighbor decision rules: recent progress," Interface-2002, 34th Symposium on Computing and Statistics (theme: Geoscience and Remote Sensing), Ritz-Carlton Hotel, Montreal, Canada, April 17-20, 2002 l Paul Horton and Kenta Nakai. Better prediction of protein cellular localization sites with the k nearest neighbors classifier. In Proceeding of the Fifth International Conference on Intelligent Systems for Molecular Biology, pages , Menlo Park, AAAI Press. l J.M. Keller, M.R. Gray, and jr. J.A. Givens. A fuzzy k-nearest neighbor. algorithm. IEEE Trans. on Syst., Man & Cyb., 15(4):580–585, 1985 l Seidl, T. and Kriegel, H Optimal multi-step k-nearest neighbor search. In Proceedings of the 1998 ACM SIGMOD international Conference on Management of Data (Seattle, Washington, United States, June , 1998). A. Tiwary and M. Franklin, Eds. SIGMOD '98. ACM Press, New York, NY, DOI= l Song, Z. and Roussopoulos, N K-Nearest Neighbor Search for Moving Query Point. In Proceedings of the 7th international Symposium on Advances in Spatial and Temporal Databases (July , 2001). C. S. Jensen, M. Schneider, B. Seeger, and V. J. Tsotras, Eds. Lecture Notes In Computer Science, vol Springer-Verlag, London, l N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor queries. In Proc. of the ACM SIGMOD Intl. Conf. on Management of Data, pages , l Hart, P. (1968). The condensed nearest neighbor rule. IEEE Trans. on Inform. Th., 14, l Gates, G. W. (1972). The Reduced Nearest Neighbor Rule. IEEE Transactions on Information Theory 18: l D.T. Lee, "On k-nearest neighbor Voronoi diagrams in the plane," IEEE Trans. on Computers, Vol. C-31, 1982, pp l Franco-Lopez, H., Ek, A.R., Bauer, M.E., Estimation and mapping of forest stand density, volume, and cover type using the k-nearest neighbors method. Rem. Sens. Environ. 77, 251–274. l Bezdek, J. C., Chuah, S. K., and Leep, D Generalized k-nearest neighbor rules. Fuzzy Sets Syst. 18, 3 (Apr. 1986), DOI= l Cost, S., Salzberg, S.: A weighted nearest neighbor algorithm for learning with symbolic features. Machine Learning 10 (1993) 57–78. (PEBLS: Parallel Examplar-Based Learning System) General References