More Methodology; Nearest-Neighbor Classifiers Sec 4.7.

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

Imbalanced data David Kauchak CS 451 – Fall 2013.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Indian Statistical Institute Kolkata
Classification and Decision Boundaries
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
Evaluation.
Supervised Learning I, Cont’d Reading: DH&S, Ch
Credibility: Evaluating what’s been learned. Evaluation: the key to success How predictive is the model we learned? Error on the training data is not.
Learning from Observations Chapter 18 Section 1 – 4.
By Fernando Seoane, April 25 th, 2006 Demo for Non-Parametric Classification Euclidean Metric Classifier with Data Clustering.
Classification Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA Who.
Linear methods: Regression & Discrimination Sec 4.6.
Evaluation.
Instance Based Learning. Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return the answer associated.
Steep learning curves Reading: Bishop Ch. 3.0, 3.1.
Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Decision trees and empirical methodology Sec 4.3,
1 Ensembles of Nearest Neighbor Forecasts Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside Dennis DeCoste.
The joy of data Plus, bonus feature: fun with differentiation Reading: DH&S Ch
Steep learning curves Reading: DH&S, Ch 4.6, 4.5.
The joy of Entropy.
Nearest Neighbor Classifiers other names: –instance-based learning –case-based learning (CBL) –non-parametric learning –model-free learning.
NN Cont’d. Administrivia No news today... Homework not back yet Working on it... Solution set out today, though.
Nearest-Neighbor Classifiers Sec minutes of math... Definition: a metric function is a function that obeys the following properties: Identity:
Ensemble Learning (2), Tree and Forest
Module 04: Algorithms Topic 07: Instance-Based Learning
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Chapter 9 – Classification and Regression Trees
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
Today Ensemble Methods. Recap of the course. Classifier Fusion
For Wednesday No reading Homework: –Chapter 18, exercise 6.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 6: Nearest and k-nearest Neighbor Classification.
Weka Just do it Free and Open Source ML Suite Ian Witten & Eibe Frank University of Waikato New Zealand.
COMP 2208 Dr. Long Tran-Thanh University of Southampton K-Nearest Neighbour.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten’s and E. Frank’s “Data Mining” and Jeremy Wyatt and others.
KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.
Validation methods.
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
Overview Data Mining - classification and clustering
K nearest neighbors algorithm Parallelization on Cuda PROF. VELJKO MILUTINOVIĆ MAŠA KNEŽEVIĆ 3037/2015.
CS Machine Learning Instance Based Learning (Adapted from various sources)
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Data Science Credibility: Evaluating What’s Been Learned
Data Science Algorithms: The Basic Methods
Lecture 05: K-nearest neighbors
Data Mining (and machine learning)
Instance Based Learning (Adapted from various sources)
K Nearest Neighbor Classification
Classification Algorithms
Intro to Machine Learning
Nearest Neighbors CSC 576: Data Mining.
Lecture 03: K-nearest neighbors
Model generalization Brief summary of methods
Statistical Models and Machine Learning Algorithms --Review
CS639: Data Management for Data Science
Data Mining CSCI 307, Spring 2019 Lecture 21
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Presentation transcript:

More Methodology; Nearest-Neighbor Classifiers Sec 4.7

Review: Properties of DTs Axis orthagonal, hyperrectangular, piecewise- constant models Categorical labels Non-metric

Separation of train & test Fundamental principle (1st amendment of ML): Don’t evaluate accuracy (performance) of your classifier (learning system) on the same data used to train it!

Holdout data Usual to “hold out” a separate set of data for testing; not used to train classifier A.k.a., test set, holdout set, evaluation set, etc. E.g., is training set accuracy is test set (or generalization) accuracy

Gotchas... What if you’re unlucky when you split data into train/test? E.g., all train data are class A and all test are class B? No “red” things show up in training data Best answer: stratification Try to make sure class (+feature) ratios are same in train/test sets (and same as original data) Why does this work? Almost as good: randomization Shuffle data randomly before split Why does this work?

More gotchas... What if your data set is small? Might not be able to get perfect stratification Can’t get really representative accuracy from any single train/test split A: cross-validation for (i=0;i<k;++i) { [Xtrain,Ytrain,Xtest,Ytest]= splitData(X,Y,N/k,i); model[i]=train(Xtrain,Ytrain); cvAccs[i]=measureAcc(model[i],Xtest,Ytest); } avgAcc=mean(cvAccs); stdAcc=stddev(cvAccs);

CV in pix [X;Y][X;Y] Original data [X’;Y’] Random shuffle k -way partition [X1’ Y1’] [X2’ Y2’] [Xk’ Yk’]... k train/ test sets k accuracies 53.7%85.1%73.2%

But is it really learning? Now we know how well our models are performing But are they really learning? Maybe any classifier would do as well E.g., a default classifier (pick the most likely class) or a random classifier How can we tell if the model is learning anything? Go back to first definitions What does it mean to learn something?

The learning curve Train on successively larger fractions of data Watch how accuracy (performance) changes

Measuring variance Cross validation helps you get better estimate of accuracy for small data Randomization (shuffling the data) helps guard against poor splits/ordering of the data Learning curves help assess learning rate/asymptotic accuracy Still one big missing component: variance Definition: Variance of a classifier is the fraction of error due to the specific data set it’s trained on

Measuring variance Variance tells you how much you expect your classifier/performance to change when you train it on a new (but similar) data set E.g., take 5 samplings of a data source; train/test 5 classifiers Accuracies: 74.2, 90.3, 58.1, 80.6, 90.3 Mean accuracy: 78.7% Std dev of acc: 13.4% Variance is usually a function of both classifier and data source High variance classifiers are very susceptible to small changes in data

Putting it all together Suppose you want to measure the expected accuracy of your classifier, assess learning rate, and measure variance all at the same time? for (i=0;i<10;++i) { // variance reps shuffle data do 10-way CV partition of data for each train/test partition { // xval for (pct=0.1;pct+=0.1;pct<=0.9) { // LC Subsample pct fraction of training set train on subsample, test on test set } avg across all folds of CV partition generate learning curve for this partition } get mean and std across all curves

Putting it all together “hepatitis” data

5 minutes of math... Decision trees are non-metric Don’t know anything about relations between instances, except sets induced by feature splits Often, we have well-defined distances between points Idea of distance encapsulated by a metric

5 minutes of math... Definition: a metric function is a function that obeys the following properties: Identity: Symmetry: Triangle inequality:

5 minutes of math... Examples: Euclidean distance * Note: omitting the square root still yields a metric and usually won’t change our results

5 minutes of math... Examples: Manhattan (taxicab) distance Distance travelled along a grid between two points No diagonals allowed

5 minutes of math... Examples: What if some attribute is categorical?

5 minutes of math... Examples: What if some attribute is categorical? Typical answer is 0/1 distance: For each attribute, add 1 if the instances differ in that attribute, else 0

Distances in classification Nearest neighbor: find the nearest instance to the query point in feature space, return the class of that instance Simplest possible distance-based classifier With more notation:

Properties of NN Training time of NN? Classification time? Geometry of model?

Properties of NN Training time of NN? Classification time? Geometry of model?

Properties of NN Training time of NN? Classification time? Geometry of model?

NN miscellaney Slight generalization: k -Nearest neighbors ( k - NN) Find k training instances closest to query point Vote among them for label Q: How does this affect system? Gotcha: unscaled dimensions What happens if one axis is measured in microns and one in lightyears? Usual trick is to scale each axis to [-1,1] range