K Nearest Neighbors and Instance-based methods

Slides:



Advertisements
Similar presentations
INSTANCE-BASED LEARNING ALGORITHMS Presented by Yan T. Yang.
Advertisements

1 Classification using instance-based learning. 3 March, 2000Advanced Knowledge Management2 Introduction (lazy vs. eager learning) Notion of similarity.
Machine Learning Instance Based Learning & Case Based Reasoning Exercise Solutions.
Data Mining Feature Selection. Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but yet produces the same.
1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
Instance Based Learning
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Lazy vs. Eager Learning Lazy vs. eager learning
Classification and Decision Boundaries
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
K nearest neighbor and Rocchio algorithm
Learning from Observations Chapter 18 Section 1 – 4.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.
Instance Based Learning
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
Data Mining Classification: Alternative Techniques
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Instance Based Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán) 1.
INSTANCE-BASE LEARNING
CS Instance Based Learning1 Instance Based Learning.
APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
1 CSC 8520 Spring Paula Matuszek CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Spring, 2013.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 9 Instance-Based.
1 CSC 9010 Spring Paula Matuszek CSC 9010 ANN Lab Paula Matuszek Spring, 2011.
CpSc 881: Machine Learning Instance Based Learning.
CpSc 810: Machine Learning Instance Based Learning.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
CSC 8520 Spring Paula Matuszek DecisionTreeFirstDraft Paula Matuszek Spring,
CS Machine Learning Instance Based Learning (Adapted from various sources)
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Kansas State University Department of Computing and Information Sciences CIS 890: Special Topics in Intelligent Systems Wednesday, November 15, 2000 Cecil.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
CS 8751 ML & KDDInstance Based Learning1 k-Nearest Neighbor Locally weighted regression Radial basis functions Case-based reasoning Lazy and eager learning.
General-Purpose Learning Machine
Data Mining – Algorithms: Instance-Based Learning
3.3. Case-Based Reasoning (CBR)
Instance Based Learning
Information Retrieval
CS 8520: Artificial Intelligence
Instance Based Learning (Adapted from various sources)
K Nearest Neighbor Classification
Classification Nearest Neighbor
Nearest-Neighbor Classifiers
Prepared by: Mahmoud Rafeek Al-Farra
یادگیری بر پایه نمونه Instance Based Learning Instructor : Saeed Shiry
Text Categorization Assigning documents to a fixed set of categories
Instance Based Learning
COSC 4335: Other Classification Techniques
Junheng, Shengming, Yunsheng 11/2/2018
Machine Learning in Practice Lecture 23
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Machine Learning: UNIT-4 CHAPTER-1
Nearest Neighbors CSC 576: Data Mining.
Data Mining Classification: Alternative Techniques
Nearest Neighbor Classifiers
Evaluating Classifiers
Assignment 1: Classification by K Nearest Neighbors (KNN) technique
Data Mining CSCI 307, Spring 2019 Lecture 21
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Machine Learning in Practice Lecture 20
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Presentation transcript:

K Nearest Neighbors and Instance-based methods Villanova University Machine Learning Project

Learning by Analogy: Case-based Reasoning Case-based systems are a significant chunk of artificial intelligence in their own right. A case-based system has two major components: Case base Problem solver The case base contains a growing set of cases, analogous to either a knowledge base or a training set. Problem solver has A case retriever and A case reasoner. May also have a case installer. Villanova University Machine Learning Project K Nearest Neighbors

Case-Based Retrieval Cases are described as a set of features Retrieval uses methods such as Nearest neighbor: compare all features to all cases in data set and choose closest match Indexed: compute and store some indices with each case and retrieve matching indices Domain-based model clustering: CB is organized into a domain model; insertion is harder, but retrieval is easier. Villanova University Machine Learning Project CSC 8520 Spring 2010. Paula Matuszek K Nearest Neighbors

Machine Learning Project Examples Glass classification in Weka features are values for Na, K, etc Text classification: “documents like this one” Features are the word frequencies in the document Villanova University Machine Learning Project K Nearest Neighbors

Simple Case-Based Reasoning Example A frequency matrix for diagnosing system problems is a simple case-based example Representation is a matrix of observed symptoms and causes Each case is an entry in cell of the matrix Critic is actual outcome of case Learner adds entry to appropriate cells Performer matches symptoms, chooses possible causes Villanova University Machine Learning Project K Nearest Neighbors

Machine Learning Project Car Diagnosis Battery dead Out of gas Alternator bad Battery bad Car won’t start case 2 case 3 Car stalls at stoplights case 4 case 5 Car misfires in rainy weather Lights won’t come on Villanova University Machine Learning Project K Nearest Neighbors

Machine Learning Project Case-based Reasoning Definition of relevant features is critical: Need to get the ones which influence outcomes At the right level of granularity The reasoner can be a complex planning and what-if reasoning system, or a simple query for missing data. Only really becomes a “learning” system if there is a case installer as well. Can grow cumulatively. Villanova University Machine Learning Project K Nearest Neighbors

Machine Learning Project K-Nearest Neighbor All instances form the trained system For a new case, determine “distance” to each training instance. Typically Euclidian distance Manhattan distance Weighted distance metrics Use the k nearest instances to determine class Villanova University Machine Learning Project K Nearest Neighbors 18

Machine Learning Project Distance Measures Euclidian: shortest distance between two points in a straight line Manhattan: “block distance”. Shortest path between two points using only 90 degree angles Weighted: Variant on Euclidian giving more weight to some directions. Villanova University Machine Learning Project K Nearest Neighbors

Machine Learning Project Example Feature 1 ? ? Feature 2 Villanova University Machine Learning Project K Nearest Neighbors

Machine Learning Project KNN: What Value for K? Tradeoff between looking at more neighbors (larger k). Ignore noise better, less risk of a outliers distorting decision. But computationally more expensive looking at fewer neighbors. Faster, Does not risk forcing distant neighbors into decision Start with k = 1, then 3, etc, until accuracy drops. Weka has a capability to do this automatically Villanova University Machine Learning Project K Nearest Neighbors

Machine Learning Project KNN Advantages Incremental. Each new instance for which we get feedback can be added to the training data. Training is very fast (lazy!) All information is retained Can learn quite complex relationships Villanova University Machine Learning Project K Nearest Neighbors

Machine Learning Project KNN Disadvantages Uses a lot of storage, since all instances are retained Slow at query time Sensitive to irrelevant features Does not create a general model which can be examined. Villanova University Machine Learning Project K Nearest Neighbors

Machine Learning Project KNN in Weka The basic KNN classifier in Weka is IBk, under the Lazy category. (InstanceBasedK) Default k value is 1. Settable in the Choose window (right-click) Setting cross-Validate to True will use hold-one-out cross validation to choose the best k between 1 and the value set in the parameters. windowSize can be used to set a limit on the number of training cases. New cases replace oldest. A value of zero (the default) means no limit. Villanova University Machine Learning Project K Nearest Neighbors

Machine Learning Project IBk Outputs IBk gives us the same output sections as J48 However, under Classifier Model we see IB1 instance-based classifier using 1 nearest neighbour(s) for classification IBk does not show us anything comparable to the decision tree of J48. Instance-based methods could only show the entire database of examples For KNN we will be most interested in what happens with new examples. Villanova University Machine Learning Project K Nearest Neighbors