MA4102 – Data Mining and Neural Networks Nathan Ifill University of Leicester Image source: Antti Ajanki, “Example of k-nearest neighbor.

Slides:



Advertisements
Similar presentations
Clustering Basic Concepts and Algorithms
Advertisements

CSCE555 Bioinformatics Lecture 15 classification for microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Instance-based Classification Examine the training samples each time a new query instance is given. The relationship between the new query instance and.
INTRODUCTION TO Machine Learning 3rd Edition
Indian Statistical Institute Kolkata
Instance Based Learning
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Statistical Classification Rong Jin. Classification Problems X Input Y Output ? Given input X={x 1, x 2, …, x m } Predict the class label y  Y Y = {-1,1},
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning 4.1 Introduction Instance-Based Learning: Local approximation to the.
Lecture 6 K-Nearest Neighbor Classifier
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
Instance Based Learning
K nearest neighbor and Rocchio algorithm
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Machine Learning Group University College Dublin Nearest Neighbour Classifiers Lazy v’s Eager k-NN Condensed NN.
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Data Mining Classification: Alternative Techniques
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
CES 514 – Data Mining Lec 9 April 14 Mid-term k nearest neighbor.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Chapter 4 (part 2): Non-Parametric Classification
Nearest Neighbour Condensing and Editing David Claus February 27, 2004 Computer Vision Reading Group Oxford.
INSTANCE-BASE LEARNING
General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.
CS Instance Based Learning1 Instance Based Learning.
Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.
Supervised Learning and k Nearest Neighbors Business Intelligence for Managers.
CSE 185 Introduction to Computer Vision Pattern Recognition.
Data mining and machine learning A brief introduction.
Bayesian Networks. Male brain wiring Female brain wiring.
GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013.
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Basic Data Mining Technique
Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Methods: Bagging and Boosting
David Claus and Christoph F. Eick: Nearest Neighbor Editing and Condensing Techniques Nearest Neighbor Editing and Condensing Techniques 1.Nearest Neighbor.
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
1 Eick, Zeidat, Vilalta: Using Representative-based Clustering for NN Dataset Editing (ICDM04) Using Representative-Based Clustering For Nearest Neighbour.
Lecture 27: Recognition Basics CS4670/5670: Computer Vision Kavita Bala Slides from Andrej Karpathy and Fei-Fei Li
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Overview Data Mining - classification and clustering
A Brief Introduction and Issues on the Classification Problem Jin Mao Postdoc, School of Information, University of Arizona Sept 18, 2015.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Fuzzy Pattern Recognition. Overview of Pattern Recognition Pattern Recognition Procedure Feature Extraction Feature Reduction Classification (supervised)
Jivko Sinapov, Kaijen Hsiao and Radu Bogdan Rusu Proprioceptive Perception for Object Weight Classification.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
KNN & Naïve Bayes Hongning Wang
Semi-Supervised Clustering
Constrained Clustering -Semi Supervised Clustering-
K Nearest Neighbor Classification
Prepared by: Mahmoud Rafeek Al-Farra
Prepared by: Mahmoud Rafeek Al-Farra
Data Mining extracting knowledge from a large amount of data
Introduction.
Presentation transcript:

MA4102 – Data Mining and Neural Networks Nathan Ifill University of Leicester Image source: Antti Ajanki, “Example of k-nearest neighbor classification”, 28 May 2007

k-nearest neighbour algorithm

 Classes with more frequent examples dominate predictions of unknown instances.  Assigning weights helps to remove this problem.  The algorithm can be computationally intensive depending on the size of the training set.

 Both low and high values of k have their advantages.  The best value of k is dependent on the data.  Cross-validation can be used to compare k’s.

Noise: 20 Noise: 30 Noise: 40 Noise: 50 Noise: 60 Noise: 70

Number of points of random noise (for each colour) Percentage of unsuccessfully tested red points Percentage of unsuccessfully tested blue points 2013%18% 3027%24% 4033%20% 5030%24% 6030%26% 7035%30% Pearson product- moment correlation coefficient:

Condensed nearest neighbour data reduction method

 Outliers are points whose k nearest points are not of the same class.  X={x 1, x 2,..., x n } (without outliers)  P={x 1 }  We scan all elements of X and move individual elements to P if their nearest prototype (their nearest element from P) has a different class label.  Repeat until no more new prototypes are found.  Absorbed points are the points which are not prototypes.

 In the applet:  Prototypes are denoted with squares.  Outliers are denoted with crosses.  Absorbed points are denoted with empty circles.  Absorbed points and outliers are not used for classification, any maps that are created or any type of testing.

 The k-nearest neighbour algorithm classifies objects based on a majority vote of the k nearest training examples.  We assign the class label which is the most frequent amongst the k training examples which are nearest or most similar to our previously unseen instance.

 CNN reduces the amount of data necessary for classification.  Points are labelled as either prototypes, outliers or absorbed points.  Absorbed points and outliers are then no longer used in classification tasks, validation tests or maps of the data set.

 Adam McMaster, et al, 2011, Wikipedia-k-Nearest-Neighbor-Algorithm. [pdf] Available at: Nearest-Neighbor-Algorithm.pdf [Accessed 12 Feb 2012] Nearest-Neighbor-Algorithm.pdf  E.M. Mirkes, University of Leicester, 2011, kNN and Potential Energy [online] Available at: [Accessed 2 Feb 2012]  Antal van den Bosch, 2007, K-nearest neighbor classification [video online] Available at: [Accessed 10 Feb 2012]  David Claus, 2004, Nearest Neighbour Condensing and Editing [ppt] Available at: [Accessed 12 Feb 2012]  mathematicalmonk, 2011, (ML 1.6) k-Nearest Neighbor classification algorithm [video online] Available at: [Accessed 10 Feb 2012]  László Kozma, 2008, k Nearest Neighbors algorithm (kNN) [pdf] Available at: [Accessed 12 Feb 2012]  Ola Söder, 2008, kNN classifiers 1. What is a kNN classifier? [online] Available at: [Accessed 12 Feb 2012] ssifier_.html