K Nearest Neighbor Classification Methods Qiang Yang.

Slides:



Advertisements
Similar presentations
Lecture 3: CBR Case-Base Indexing
Advertisements

Case Based Reasoning Lecture 4: CBR Tutorial on Decision Trees.
Molecular Biomedical Informatics 分子生醫資訊實驗室 Machine Learning and Bioinformatics 機器學習與生物資訊學 Machine Learning & Bioinformatics 1.
Data Representation. The popular table  Table (relation)  propositional, attribute-value  Example  record, row, instance, case  individual, independent.
1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
Indian Statistical Institute Kolkata
Lazy vs. Eager Learning Lazy vs. eager learning
Statistical Classification Rong Jin. Classification Problems X Input Y Output ? Given input X={x 1, x 2, …, x m } Predict the class label y  Y Y = {-1,1},
Decision Trees.
K nearest neighbor and Rocchio algorithm
1 Bayesian Classification Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe Frank.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
By Fernando Seoane, April 25 th, 2006 Demo for Non-Parametric Classification Euclidean Metric Classifier with Data Clustering.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.
Instance-based representation
Chapter 2 Data Mining Tasks.
1 Data Mining with Bayesian Networks (I) Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe.
Classification: Decision Trees
1-NN Rule: Given an unknown sample X decide if for That is, assign X to category if the closest neighbor of X is from category i.
1 Bayesian Classification Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe Frank.
Algorithms for Classification: Notes by Gregory Piatetsky.
Decision Trees an Introduction.
Data Mining Classification: Alternative Techniques
Supervised Distance Metric Learning Presented at CMU’s Computer Vision Misc-Read Reading Group May 9, 2007 by Tomasz Malisiewicz.
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
Review. 2 Statistical modeling  “Opposite” of 1R: use all the attributes  Two assumptions: Attributes are  equally important  statistically independent.
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
Instance Based Learning IB1 and IBK Small section in chapter 20.
Inferring rudimentary rules
Algorithms for Classification: The Basic Methods.
INSTANCE-BASE LEARNING
Distance Metric Learning for Large Margin Nearest Neighbor Classification (LMNN) NIPS 2006 Kilian Q. Weinberger, John Blitzer and Lawrence K. Saul.
Marcus Sampaio DSC/UFCG. Marcus Sampaio DSC/UFCG Lecture Notes for Chapter 8 Introduction to Data Mining by Tan, Steinbach, Kumar Cluster Analysis: Basic.
CS Instance Based Learning1 Instance Based Learning.
K Nearest Neighbor Classification Methods Qiang Yang.
Classification by Machine Learning Approaches Michael J. Kerner – Center for Biological Sequence Analysis Technical.
Classification: Decision Trees 2 Outline  Top-Down Decision Tree Construction  Choosing the Splitting Attribute  Information Gain and Gain Ratio.
5. Machine Learning ENEE 759D | ENEE 459D | CMSC 858Z
Data Mining – Algorithms: OneR Chapter 4, Section 4.1.
and Confidential NOTICE: Proprietary and Confidential This material is proprietary to A. Teredesai and GCCIS, RIT. Slide 1 Decision.
Classification II. 2 Numeric Attributes Numeric attributes can take many values –Creating branches for each value is not ideal The value range is usually.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Data Mining Practical Machine Learning Tools and Techniques Chapter 1: What’s it all about? Rodney Nielsen Many of these slides were adapted from: I. H.
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
Decision-Tree Induction & Decision-Rule Induction
Knowledge Discovery via Data mining Enrico Tronci Dipartimento di Informatica, Università di Roma “La Sapienza”, Via Salaraia 113, Roma, Italy,
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
Algorithms for Classification: The Basic Methods.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 6: Nearest and k-nearest Neighbor Classification.
Slide 1 DSCI 4520/5240: Data Mining Fall 2013 – Dr. Nick Evangelopoulos Lecture 5: Decision Tree Algorithms Material based on: Witten & Frank 2000, Olson.
Classification And Bayesian Learning
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Example: input data outlooktemp.humiditywindyplay sunnyhothighfalseno sunnyhothightrueno overcasthothighfalseyes rainymildhighfalseyes rainycoolnormalfalseyes.
K Nearest Neighbor Classification Methods. Training Set.
Data Management and Database Technologies 1 DATA MINING Extracting Knowledge From Data Petr Olmer CERN
Data Science What’s it all about? Rodney Nielsen Associate Professor Director Human Intelligence and Language Technologies Lab Many of these slides were.
K-Nearest Neighbor Learning.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Decision Trees by Muhammad Owais Zahid
Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh.
Columbia University Advanced Machine Learning & Perception – Fall 2006 Term Project Nonlinear Dimensionality Reduction and K-Nearest Neighbor Classification.
Data and its Distribution. The popular table  Table (relation)  propositional, attribute-value  Example  record, row, instance, case  Table represents.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Information Retrieval
Metric Learning for Clustering
Nearest-Neighbor Classifiers
Eco 6380 Predictive Analytics For Economists Spring 2016
Data Mining Classification: Alternative Techniques
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

K Nearest Neighbor Classification Methods Qiang Yang

Training Set

The K-Nearest Neighbor Method Used for prediction/classification Given input x, (e.g., #neighbors = K (e.g., k=3) –Often a parameter to be determined The form of the distance function –K neighbors in training data to the input data x: Break ties arbitrarily All k neighbors will vote: majority wins Weighted K-means “K” is a variable: –Often we experiment with different values of K=1, 3, 5, to find out the optimal one Why important? –Often a baseline –Must beat this one to claim innovation Forms of K-NN –Document similarity Cosine –Case based reasoning Edited data base Sometimes better than 100% –Image understanding Manifold learning Distance metric

How to decide the distance? Try 3-NN on this data: OutlookTemperatureHumidityWindyPlay sunnyhothighFALSEno sunnyhothighTRUEno overcasthothighFALSEyes rainymildhighFALSEyes rainycoolnormalFALSEyes rainycoolnormalTRUEno overcastcoolnormalTRUEyes sunnymildhighFALSEno sunnycoolnormalFALSEyes rainymildnormalFALSEyes sunnymildnormalTRUEyes overcastmildhighTRUEyes overcasthotnormalFALSEyes rainymildhighTRUEno testing

K-NN can be misleading Consider applying K-NN on the training data –What is the accuracy? –Why? –What should we do in testing?