Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.

Slides:



Advertisements
Similar presentations
Heuristic Search techniques
Advertisements

1 Classification using instance-based learning. 3 March, 2000Advanced Knowledge Management2 Introduction (lazy vs. eager learning) Notion of similarity.
Conceptual Clustering
Nearest Neighbor Search
DECISION TREES. Decision trees  One possible representation for hypotheses.
Linear Regression.
Clustering Categorical Data The Case of Quran Verses
Comments We consider in this topic a large class of related problems that deal with proximity of points in the plane. We will: 1.Define some proximity.
Michael Alves, Patrick Dugan, Robert Daniels, Carlos Vicuna
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Multidimensional Indexing
Searching on Multi-Dimensional Data
K-means method for Signal Compression: Vector Quantization
1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
Multidimensional Data
Indian Statistical Institute Kolkata
Metrics, Algorithms & Follow-ups Profile Similarity Measures Cluster combination procedures Hierarchical vs. Non-hierarchical Clustering Statistical follow-up.
Classification and Decision Boundaries
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Regression. So far, we've been looking at classification problems, in which the y values are either 0 or 1. Now we'll briefly consider the case where.
Optimization of ICP Using K-D Tree
SASH Spatial Approximation Sample Hierarchy
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Learning from Observations Chapter 18 Section 1 – 4.
Instance Based Learning. Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return the answer associated.
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
Decision Trees (2). Numerical attributes Tests in nodes are of the form f i > constant.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
What is Cluster Analysis?
COMP 451/651 Multiple-key indexes
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor.
Multidimensional Data Many applications of databases are ``geographic'' = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
10/11/2001CS 638, Fall 2001 Today Kd-trees BSP Trees.
October 8, 2013Computer Vision Lecture 11: The Hough Transform 1 Fitting Curve Models to Edges Most contours can be well described by combining several.
Data Structures for Computer Graphics Point Based Representations and Data Structures Lectured by Vlastimil Havran.
Module 04: Algorithms Topic 07: Instance-Based Learning
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
1 Lazy Learning – Nearest Neighbor Lantz Ch 3 Wk 2, Part 1.
Learning: Nearest Neighbor Artificial Intelligence CMSC January 31, 2002.
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
hoe Last viewed 1 PowerPoint Slide Show (.pps) You can advance through each part of the screen by left clicking When you see the at the top right of the.
B-trees and kd-trees Piotr Indyk (slides partially by Lars Arge from Duke U)
K Nearest Neighbors Classifier & Decision Trees
1 Pattern Classification X. 2 Content General Method K Nearest Neighbors Decision Trees Nerual Networks.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Week 12 - Wednesday.  What did we talk about last time?  Matching  Stable marriage  Started Euler paths.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis.
Vector Quantization CAP5015 Fall 2005.
KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
K nearest neighbors algorithm Parallelization on Cuda PROF. VELJKO MILUTINOVIĆ MAŠA KNEŽEVIĆ 3037/2015.
CS Machine Learning Instance Based Learning (Adapted from various sources)
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Instance Based Learning
Orthogonal Range Searching and Kd-Trees
Instance Based Learning (Adapted from various sources)
K Nearest Neighbor Classification
Instance Based Learning
Nearest Neighbors CSC 576: Data Mining.
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Presentation transcript:

Nearest Neighbor

Predicting Bankruptcy

Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return the answer associated with it In order to say what point is nearest, we have to define what we mean by "near". Typically, we use Euclidean distance between two points, which is just the square root of the sum of the squared differences between corresponding feature values.

Scaling The naive Euclidean distance isn't always appropriate, though. Consider the case where we have two features describing a car. –f1 = weight in pounds –f2 = number of cylinders. Any effect of f2 will be completely lost because of the relative scales. So, rescale the inputs: It means to compute the average value of the feature, x-bar, and subtract it from each feature value, which will give us features all centered at 0. Then, to deal with the range, you compute the standard deviation and divide each value by that. This transformation puts all of the features on about equal footing.

Scaling II Another popular, but somewhat advanced, technique is to use cross validation and gradient descent to choose weightings of the features that generate the best performance on the particular data set. Let's see how nearest neighbor works on our bankruptcy example. Let's say we've thought about the domain and decided that the R feature needs to be scaled up by 5 in order to be appropriately balanced against the L feature. So we'll use Euclidian distance, but with the R values multiplied by 5 first. –We've scaled the axes on the slide so that the two dimensions are graphically equal. –This means that locus of points at a particular distance d from a point on our graph will appear as a circle.

Predicting Bankruptcy Now, let's say we have a new person with R equal to 0.3 and L equal to 2. What y value should we predict? And so our answer would be "no".

Hypothesis So, what is the hypothesis of the nearest neighbor algorithm? It's sort of different from our other algorithms, in that it isn't explicitly constructing a description of a hypothesis based on the data it sees. Given a set of points and a distance metric, you can divide the space up into regions, one for each point, which represent the set of points in space that are nearer to this designated point than to any of the others.

"Voronoi partition" of the space Now, we can think of our hypothesis as being represented by the edges in the Voronoi partition that separate a region associated with a positive point from a region associated with a negative one. It's important to note that we never explicitly compute this boundary; it just arises out of the "nearest neighbor" query process.

Time and Space Learning is fast –We just have to remember the training data. What takes longer is answering a query. If we do it naively, we have to, for each point in our training set (and there are m of them) compute the distance to the query point (which takes about n computations, since there are n features to compare). So, overall, this takes about m * n time. However, there are special data-structures such as KD-Trees, with which we can reduce the above to (log m) * n Memory might be an issue, since we have to remember all the data. –However, the KD-trees work fine for secondary storage, having in the leaves disk blocks.

KD-Trees Generalizes binary search trees, but search attributes rotate among dimensions Nodes have a pair of attribute A and value V; Leaves are disk blocks.

KD-Trees i c e a bd g h j k Sal 150 Age 60 Age 47 Age 38 Sal 80 f l Sal 300 Age h b i a c de g f k j Sal l

Noise In our example so far, there has not been much (apparent) noise; –the boundary between positives and negatives is clean and simple. Let's now consider the case where there's a blue point down among the reds. –Someone with an apparently healthy financial record goes bankrupt. There are two ways to deal with this data point. –One is to assume that it is not noise; –The other is to say that this example is an "outlier". It represents an unusual case that we would prefer largely to ignore, and not to incorporate it into our hypothesis.

K-Nearest Neighbors If we think there might be noise in the data, we can change the algorithm a bit to try to ignore it. k-nearest neighbor algorithm: –It's just like the old algorithm, except that when we get a query, we'll search for the k closest points to the query points. Output what the majority says. –In this case, we've chosen k to be 3. –The three closest points consist of two "no"s and a "yes", so our answer would be "no". Find the optimal k using cross- validation