Steep learning curves Reading: Bishop Ch. 3.0, 3.1.

Slides:



Advertisements
Similar presentations
Imbalanced data David Kauchak CS 451 – Fall 2013.
Advertisements

Learning Algorithm Evaluation
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Evaluation.
Credibility: Evaluating what’s been learned. Evaluation: the key to success How predictive is the model we learned? Error on the training data is not.
More Methodology; Nearest-Neighbor Classifiers Sec 4.7.
Intro to Linear Methods Reading: Bishop, 3.0, 3.1, 4.0, 4.1 hip to be hyperplanar...
Bayesian Learning, Part 1 of (probably) 4 Reading: DH&S, Ch. 2.{1-5}, 3.{1-4}
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Evaluation.
Supervised Learning I, Cont’d. Administrivia Machine learning reading group Not part of/related to this class We read advanced (current research) papers.
Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes.
Decision trees and empirical methodology Sec 4.3,
The joy of data Plus, bonus feature: fun with differentiation Reading: DH&S Ch
Steep learning curves Reading: DH&S, Ch 4.6, 4.5.
Bayesian Learning 1 of (probably) 2. Administrivia Readings 1 back today Good job, overall Watch your spelling/grammar! Nice analyses, though Possible.
Evaluation and Credibility
Basic Mathematics for Portfolio Management. Statistics Variables x, y, z Constants a, b Observations {x n, y n |n=1,…N} Mean.
The joy of Entropy.
Supervised Learning I, Cont’d Reading: Bishop, Ch 14.4, 1.6, 1.5.
Support Vector Machines a.k.a, Whirlwind o’ Vector Algebra Sec. 6.3 SVM Tutorial by C. Burges (on class “resources” page)
On Comparing Classifiers: Pitfalls to Avoid and Recommended Approach Published by Steven L. Salzberg Presented by Prakash Tilwani MACS 598 April 25 th.
Bayesian Learning, cont’d. Administrivia Homework 1 returned today (details in a second) Reading 2 assigned today S. Thrun, Learning occupancy grids with.
Bayesian Learning Part 3+/- σ. Administrivia Final project/proposal Hand-out/brief discussion today Proposal due: Mar 27 Midterm exam: Thurs, Mar 22 (Thurs.
Ensemble Learning (2), Tree and Forest
Model Assessment and Selection Florian Markowetz & Rainer Spang Courses in Practical DNA Microarray Analysis.
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
CLassification TESTING Testing classifier accuracy
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Bootstrap and Cross-Validation Bootstrap and Cross-Validation.
Error estimation Data Mining II Year Lluís Belanche Alfredo Vellido.
ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Today’s Topics Chapter 2 in One Slide Chapter 18: Machine Learning (ML) Creating an ML Dataset –“Fixed-length feature vectors” –Relational/graph-based.
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
University of Texas at Austin CS384G - Computer Graphics Fall 2008 Don Fussell Orthogonal Functions and Fourier Series.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
CpSc 810: Machine Learning Evaluation of Classifier.
Properties of OLS How Reliable is OLS?. Learning Objectives 1.Review of the idea that the OLS estimator is a random variable 2.How do we judge the quality.
Today Ensemble Methods. Recap of the course. Classifier Fusion
For Wednesday No reading Homework: –Chapter 18, exercise 6.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
START OF DAY 5 Reading: Chap. 8. Support Vector Machine.
Algebra Simplifying and collecting like terms. Before we get started! Believe it or not algebra is a fairly easy concept to deal with – you just need.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Weka Just do it Free and Open Source ML Suite Ian Witten & Eibe Frank University of Waikato New Zealand.
Classification Evaluation. Estimating Future Accuracy Given available data, how can we reliably predict accuracy on future, unseen data? Three basic approaches.
CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages
Validation methods.
Machine Learning in Practice Lecture 10 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Empirical Evaluation (Ch 5) how accurate is a hypothesis/model/dec.tree? given 2 hypotheses, which is better? accuracy on training set is biased – error:
Computational Intelligence: Methods and Applications Lecture 15 Model selection and tradeoffs. Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
LECTURE 20: SUPPORT VECTOR MACHINES PT. 1 April 11, 2016 SDS 293 Machine Learning.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Data Science Credibility: Evaluating What’s Been Learned
Empirical Evaluation (Ch 5)
Machine Learning Techniques for Data Mining
Learning Algorithm Evaluation
Machine Learning in Practice Lecture 26
Model generalization Brief summary of methods
CS639: Data Management for Data Science
Support Vector Machines 2
Evaluation David Kauchak CS 158 – Fall 2019.
Presentation transcript:

Steep learning curves Reading: Bishop Ch. 3.0, 3.1

Administrivia Reminder: Microsoft on campus for recruiting Next Mon, Feb 5 FEC141, 11:00 AM All welcome

Viewing and re-viewing Last time: (4)5 minutes of math: function optimization Measuring performance Today: Cross-validation Learning curves

Separation of train & test Fundamental principle (1st amendment of ML): Don’t evaluate accuracy (performance) of your classifier (learning system) on the same data used to train it!

Holdout data Usual to “hold out” a separate set of data for testing; not used to train classifier A.k.a., test set, holdout set, evaluation set, etc. E.g., is training set (or empirical) accuracy is test set (or generalization) accuracy

Gotchas... What if you’re unlucky when you split data into train/test? E.g., all train data are class A and all test are class B? No “red” things show up in training data Best answer: stratification Try to make sure class (+feature) ratios are same in train/test sets (and same as original data) Why does this work?

Gotchas... What if you’re unlucky when you split data into train/test? E.g., all train data are class A and all test are class B? No “red” things show up in training data Almost as good: randomization Shuffle data randomly before split Why does this work?

Gotchas What if the data is small? N=50 or N=20 or even N=10 Can’t do perfect stratification Can’t get representative accuracy from any single train/test split

Gotchas No good answer Common answer: cross-validation Shuffle data vectors Break into k chunks Train on first k-1 chunks Test on last 1 Repeat, with a different chunk held-out Average all test accuracies together

Gotchas In code: for (i=0;i<k;++i) { [Xtrain,Ytrain,Xtest,Ytest]= splitData(X,Y,N/k,i); model[i]=train(Xtrain,Ytrain); cvAccs[i]=measureAcc(model[i],Xtest,Ytest); } avgAcc=mean(cvAccs); stdAcc=stddev(cvAccs);

CV in pix [X;y][X;y] Original data [X’;y’] Random shuffle k -way partition [X1’ Y1’] [X2’ Y2’] [Xk’ Yk’]... k train/ test sets k accuracies 53.7%85.1%73.2%

But is it really learning? Now we know how well our models are performing But are they really learning? Maybe any classifier would do as well E.g., a default classifier (pick the most likely class) or a random classifier How can we tell if the model is learning anything?

The learning curve Train on successively larger fractions of data Watch how accuracy (performance) changes Learning Static classifier (no learning) Anti-learning (forgetting)

Measuring variance Cross validation helps you get better estimate of accuracy for small data Randomization (shuffling the data) helps guard against poor splits/ordering of the data Learning curves help assess learning rate/asymptotic accuracy Still one big missing component: variance Definition: Variance of a classifier is the fraction of error due to the specific data set it’s trained on

Measuring variance Variance tells you how much you expect your classifier/performance to change when you train it on a new (but similar) data set E.g., take 5 samplings of a data source; train/test 5 classifiers Accuracies: 74.2, 90.3, 58.1, 80.6, 90.3 Mean accuracy: 78.7% Std dev of acc: 13.4% Variance is usually a function of both classifier and data source High variance classifiers are very susceptible to small changes in data

Putting it all together Suppose you want to measure the expected accuracy of your classifier, assess learning rate, and measure variance all at the same time? for (i=0;i<10;++i) { // variance reps shuffle data do 10-way CV partition of data for each train/test partition { // xval for (pct=0.1;pct+=0.1;pct<=0.9) { // LC Subsample pct fraction of training set train on subsample, test on test set } avg across all folds of CV partition generate learning curve for this partition } get mean and std across all curves

Putting it all together “hepatitis” data

5 minutes of math... Decision trees make very few assumptions about data Don’t know anything about relations between instances, except sets induced by feature splits No sense of spatial/topological relations among data Often, our data is real, honest-to-Cthulhu, mathematically sound vector data As opposed to the informal sense of vector that I have used so far Often comes endowed with a natural inner product and norm

5 minutes of math Mathematicians like to study the properties of spaces in general From linear algebra, you’ve already met the notion of a vector space: Definition: a vector space, V, is a set of elements (vectors) plus a scalar field, F, such that the following properties hold: Vector addition: Scalar multiplication: Linearity; commutativity; associativity; etc.

5 minutes of math By itself, vector spaces only partially useful Gets more useful when you add a norm and an inner product

5 minutes of math Definition: a norm, ||.||, is a function of a single vector ( ∈ V ) that returns a scalar ( ∈ F ) such that for all a, b ∈ V and c ∈ F : ||a|| ≥ 0 ||c a|| = |c| ||a|| ||a+b||≤||a|| + ||b|| Intutition: norm gives you the length of a vector A vector space+norm ⇒ Banach space (*)

5 minutes of math Definition: an inner product, 〈 ∙, ∙ 〉, is a function of two vectors ( ∈ V ) that returns a scalar ( ∈ F ) such that: Symmetry Linearity in first variable Non-negativity Non-degeneracy A vector space+inner product ⇒ Hilbert space (*)