Presentation is loading. Please wait.

Presentation is loading. Please wait.

K Nearest Neighbors and Instance-based methods

Similar presentations


Presentation on theme: "K Nearest Neighbors and Instance-based methods"— Presentation transcript:

1 K Nearest Neighbors and Instance-based methods
Villanova University Machine Learning Project

2 Learning by Analogy: Case-based Reasoning
Case-based systems are a significant chunk of artificial intelligence in their own right. A case-based system has two major components: Case base Problem solver The case base contains a growing set of cases, analogous to either a knowledge base or a training set. Problem solver has A case retriever and A case reasoner. May also have a case installer. Villanova University Machine Learning Project K Nearest Neighbors

3 Case-Based Retrieval Cases are described as a set of features
Retrieval uses methods such as Nearest neighbor: compare all features to all cases in data set and choose closest match Indexed: compute and store some indices with each case and retrieve matching indices Domain-based model clustering: CB is organized into a domain model; insertion is harder, but retrieval is easier. Villanova University Machine Learning Project CSC 8520 Spring Paula Matuszek K Nearest Neighbors

4 Machine Learning Project
Examples Glass classification in Weka features are values for Na, K, etc Text classification: “documents like this one” Features are the word frequencies in the document Villanova University Machine Learning Project K Nearest Neighbors

5 Simple Case-Based Reasoning Example
A frequency matrix for diagnosing system problems is a simple case-based example Representation is a matrix of observed symptoms and causes Each case is an entry in cell of the matrix Critic is actual outcome of case Learner adds entry to appropriate cells Performer matches symptoms, chooses possible causes Villanova University Machine Learning Project K Nearest Neighbors

6 Machine Learning Project
Car Diagnosis Battery dead Out of gas Alternator bad Battery bad Car won’t start case 2 case 3 Car stalls at stoplights case 4 case 5 Car misfires in rainy weather Lights won’t come on Villanova University Machine Learning Project K Nearest Neighbors

7 Machine Learning Project
Case-based Reasoning Definition of relevant features is critical: Need to get the ones which influence outcomes At the right level of granularity The reasoner can be a complex planning and what-if reasoning system, or a simple query for missing data. Only really becomes a “learning” system if there is a case installer as well. Can grow cumulatively. Villanova University Machine Learning Project K Nearest Neighbors

8 Machine Learning Project
K-Nearest Neighbor All instances form the trained system For a new case, determine “distance” to each training instance. Typically Euclidian distance Manhattan distance Weighted distance metrics Use the k nearest instances to determine class Villanova University Machine Learning Project K Nearest Neighbors 18

9 Machine Learning Project
Distance Measures Euclidian: shortest distance between two points in a straight line Manhattan: “block distance”. Shortest path between two points using only 90 degree angles Weighted: Variant on Euclidian giving more weight to some directions. Villanova University Machine Learning Project K Nearest Neighbors

10 Machine Learning Project
Example Feature 1 ? ? Feature 2 Villanova University Machine Learning Project K Nearest Neighbors

11 Machine Learning Project
KNN: What Value for K? Tradeoff between looking at more neighbors (larger k). Ignore noise better, less risk of a outliers distorting decision. But computationally more expensive looking at fewer neighbors. Faster, Does not risk forcing distant neighbors into decision Start with k = 1, then 3, etc, until accuracy drops. Weka has a capability to do this automatically Villanova University Machine Learning Project K Nearest Neighbors

12 Machine Learning Project
KNN Advantages Incremental. Each new instance for which we get feedback can be added to the training data. Training is very fast (lazy!) All information is retained Can learn quite complex relationships Villanova University Machine Learning Project K Nearest Neighbors

13 Machine Learning Project
KNN Disadvantages Uses a lot of storage, since all instances are retained Slow at query time Sensitive to irrelevant features Does not create a general model which can be examined. Villanova University Machine Learning Project K Nearest Neighbors

14 Machine Learning Project
KNN in Weka The basic KNN classifier in Weka is IBk, under the Lazy category. (InstanceBasedK) Default k value is 1. Settable in the Choose window (right-click) Setting cross-Validate to True will use hold-one-out cross validation to choose the best k between 1 and the value set in the parameters. windowSize can be used to set a limit on the number of training cases. New cases replace oldest. A value of zero (the default) means no limit. Villanova University Machine Learning Project K Nearest Neighbors

15 Machine Learning Project
IBk Outputs IBk gives us the same output sections as J48 However, under Classifier Model we see IB1 instance-based classifier using 1 nearest neighbour(s) for classification IBk does not show us anything comparable to the decision tree of J48. Instance-based methods could only show the entire database of examples For KNN we will be most interested in what happens with new examples. Villanova University Machine Learning Project K Nearest Neighbors


Download ppt "K Nearest Neighbors and Instance-based methods"

Similar presentations


Ads by Google