Experiments in Machine Learning

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

Evaluating Classifiers
Chapter 4 Pattern Recognition Concepts: Introduction & ROC Analysis.
Imbalanced data David Kauchak CS 451 – Fall 2013.
Learning Algorithm Evaluation
Evaluation.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Assessing and Comparing Classification Algorithms Introduction Resampling and Cross Validation Measuring Error Interval Estimation and Hypothesis Testing.
Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,
Credibility: Evaluating what’s been learned. Evaluation: the key to success How predictive is the model we learned? Error on the training data is not.
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Evaluation.
Three kinds of learning
Jeremy Wyatt Thanks to Gavin Brown
INTRODUCTION TO Machine Learning 3rd Edition
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
Evaluating Classifiers
Classification II. 2 Numeric Attributes Numeric attributes can take many values –Creating branches for each value is not ideal The value range is usually.
Performance measurement. Must be careful what performance metric we use For example, say we have a NN classifier with 1 output unit, and we code ‘1 =
Data Analysis 1 Mark Stamp. Topics  Experimental design o Training set, test set, n-fold cross validation, thresholding, imbalance, etc.  Accuracy o.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Experiments in Machine Learning COMP24111 lecture 5 Accuracy (%) A BC D Learning algorithm.
Classification Performance Evaluation. How do you know that you have a good classifier? Is a feature contributing to overall performance? Is classifier.
Evaluating Results of Learning Blaž Zupan
Evaluating Predictive Models Niels Peek Department of Medical Informatics Academic Medical Center University of Amsterdam.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Classification Evaluation. Estimating Future Accuracy Given available data, how can we reliably predict accuracy on future, unseen data? Three basic approaches.
Evaluating Classification Performance
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.
Data Science Credibility: Evaluating What’s Been Learned
7. Performance Measurement
Evolving Decision Rules (EDR)
Evaluating Classifiers
Large Margin classifiers
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Performance Evaluation 02/15/17
Evaluating Results of Learning
9. Credibility: Evaluating What’s Been Learned
Classification Evaluation And Model Selection
Evaluating Classifiers
Machine Learning Week 10.
Data Mining Classification: Alternative Techniques
Features & Decision regions
Machine Learning Week 1.
Data Mining Practical Machine Learning Tools and Techniques
Evaluation and Its Methods
Learning Algorithm Evaluation
INTRODUCTION TO Machine Learning
Model Evaluation and Selection
Evaluating Models Part 1
Computational Intelligence: Methods and Applications
Dr. Sampath Jayarathna Cal Poly Pomona
Evaluating Classifiers
MIS2502: Data Analytics Classification Using Decision Trees
Assignment 1: Classification by K Nearest Neighbors (KNN) technique
CS639: Data Management for Data Science
MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn
Dr. Sampath Jayarathna Cal Poly Pomona
Evaluation and Its Methods
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
COSC 4368 Intro Supervised Learning Organization
More on Maxent Env. Variable importance:
ECE – Pattern Recognition Lecture 8 – Performance Evaluation
Machine Learning in Business John C. Hull
Presentation transcript:

Experiments in Machine Learning COMP61011 Experiments in Machine Learning

Scientists vs Normal People

Learning Algorithm Split our data randomly Train a model… Test it! Why do we split the data?

The Most Important Concept in Machine Learning… Looks good so far…

The Most Important Concept in Machine Learning… Looks good so far… Oh no! Mistakes! What happened?

The Most Important Concept in Machine Learning… Looks good so far… Oh no! Mistakes! What happened? We didn’t have all the data. We can never assume that we do. This is called “OVER-FITTING” to the small dataset.

Inputs Labels

Inputs Labels 15 95 33 90 78 70 45 80 18 35 65 1 50:50 (random) split Inputs Labels 50:50 (random) split 45 70 31 61 50 63 98 80 73 81 18 1

Training set Testing set Ideally should be small. Smaller is better. 15 95 33 90 78 70 45 80 18 35 65 1 Training set Train a K-NN or Perceptron on this… Ideally should be small. Smaller is better. But if too small... you’ll make many mistakes on the testing set. 45 70 31 61 50 63 98 80 73 81 18 1 Testing set … then, test it on this! “simulates” what it might be like to see new data in the future Needs to be quite big. Bigger is better.

Training set Testing set Build a model (knn, perceptron, decision tree, etc) Testing set How many incorrect predictions on testing set? Percentage of incorrect predictions is called the “error” e.g. “Training” error e.g. “Testing” error

Classifying ‘3’ versus ‘8’ digits Plotting error as a function of ‘K’ (as in the K-NN) error K Training data can behave very differently to testing data! Percentage of incorrect predictions is called the “error” e.g. “Training” error e.g. “Testing” error

MATLAB: “help errorbar” Which is the “best” value of k? Different random split of the data = different performance! 100 90 Accuracy (%) 80 70 1 3 5 7 K value used in knn classifier MATLAB: “help errorbar”

MATLAB: “help errorbar” Error bars (a.k.a. “confidence intervals”) Plots the average, and the standard deviation (spread) of values. Wider spread = less stability…. 100 90 Accuracy (%) 80 70 1 3 5 7 K value used in knn classifier MATLAB: “help errorbar”

Cross-validation MATLAB: “help crossfold” We could do repeated random splits… but there is another way… Break the data evenly into N chunks Leave one chunk out Train on the remaining N-1 chunks Test on the chunk you left out Repeat until all chunks have been used to test Plot average and error bars for the N chunks All data 5 chunks Train on these (all together) Test on this … then repeat, but test on a different chunk.

What factors should affect our decision? Accuracy - How accurate is it? Training time and space complexity How long does it take to train? How much memory overhead does it take? Testing time and space complexity How long does it take to execute the model for a new datapoint? Interpretability - How easily can we explain why it does what it does? Accuracy (%) A B C D 70 80 90 100 Learning algorithm

Best testing error at K=3, about 3.2%. Is this “good”?

Zip-codes: “8” is very common on the West Coast, while “3” is rare. Making a mistake will mean your Florida post ends up in Las Vegas!

Sometimes, classes are rare, so your learner will not see many of them. What if, in testing phase you saw 1,000 digits …. 32 instances 968 instances 3.2% error was achieved by just saying “8” to everything!

Solution? Measure accuracy on each class separately. 32 instances 0% correct 100% correct Measure accuracy on each class separately.

ROC analysis 32 instances 968 instances Our obvious solution is in fact backed up a whole statistical framework. Receiver Operator Characteristics Developed in WW-2 to assess radar operators. “How good is the radar operator at spotting incoming bombers?” False positives – i.e. falsely predicting a bombing raid False negatives – i.e. missing an incoming bomber (VERY BAD!)

FP FN TP ROC analysis TN The “‘3” digits are like the bombers. Rare events but costly if we misclassify! False positives – i.e. falsely predicting an event False negatives – i.e. missing an incoming event Similarly, we have “true positives” and “true negatives” Prediction 0 1 TN FP FN TP 1 Truth

FP FN TP Building a “Confusion” Matrix TN TP Sensitivity = TP+FN TN Prediction 0 1 TN FP FN TP 1 Truth TP … chances of spotting a “3” when presented with one (i.e. accuracy on class “3”) Sensitivity = TP+FN TN … chances of spotting an 8 when presented with one (i.e. accuracy on class “8”) Specificity = TN+FP

30 80 20 FP FN TP ROC analysis… your turn to think… 60 TN TP TN Sensitivity = = ? Specificity = = ? TP+FN TN+FP Prediction 0 1 60+30 = 90 examples in the dataset were class 0 60 30 80 20 1 Truth 80+20 = 100 examples in the dataset were class 1 TN FP FN TP 90+100 = 190 examples in the data overall

ROC analysis – incorporating costs… FP = total number of false positives I make on the testing data FN = total number of false negatives I make on the testing data Useful if one is more important than another. For example if a predictor… … misses a case of a particularly horrible disease (FALSE NEGATIVE) or … sends a patient for painful surgery when it is unnecessary (FALSE POSITIVE)

Experiments in Machine Learning Learning algorithms depend on the data you give them some algorithms are more “stable” than others must take this into account in experiments cross-validation is one way to monitor stability plot confidence intervals! ROC analysis can help us if… there are imbalanced classes false positives/negatives have different costs

Conclusion Reading for this week : 1. My notes 2. Paper: “A training algorithm for optimal margin classifiers” 3. Paper: “A Practical Guide to SVM classification” The material above is part of the course, i.e. non-optional reading. In your mini-projects, you will be expected to use techniques acquired from the above, in order to pick up the highest grades.