Steep learning curves Reading: DH&S, Ch 4.6, 4.5.

Steep learning curves Reading: DH&S, Ch 4.6, 4.5

Administrivia HW1 due now Late days are ticking... No other news today..

Viewing and re-viewing Last time: HW1 FAQ 5 minutes of math: function optimization Measuring performance Cross-validation Today: Learning curves Metrics The nearest-neighbor rule

Exercise Given the function: Find the extremum Show that the extremum is really a minimum

Mea culpa! I copied the wrong example out of the book. Oops. My bad. You guys did a great job figuring it out, though...

The saddle point

Cross-validation in words Shuffle data vectors Break into k chunks Train on first k-1 chunks Test on last 1 Repeat, with a different chunk held-out Average all test accuracies together

CV in pix [X;y][X;y] Original data [X’;y’] Random shuffle k -way partition [X1’ Y1’] [X2’ Y2’] [Xk’ Yk’]... k train/ test sets k accuracies 53.7%85.1%73.2%

But is it really learning? Now we know how well our models are performing But are they really learning? Maybe any classifier would do as well E.g., a default classifier (pick the most likely class) or a random classifier How can we tell if the model is learning anything?

The learning curve Train on successively larger fractions of data Watch how accuracy (performance) changes Learning Static classifier (no learning) Anti-learning (forgetting)

Measuring variance Cross validation helps you get better estimate of accuracy for small data Randomization (shuffling the data) helps guard against poor splits/ordering of the data Learning curves help assess learning rate/asymptotic accuracy Still one big missing component: variance Definition: Variance of a classifier is the fraction of error due to the specific data set it’s trained on

Measuring variance Variance tells you how much you expect your classifier/performance to change when you train it on a new (but similar) data set E.g., take 5 samplings of a data source; train/test 5 classifiers Accuracies: 74.2, 90.3, 58.1, 80.6, 90.3 Mean accuracy: 78.7% Std dev of acc: 13.4% Variance is usually a function of both classifier and data source High variance classifiers are very susceptible to small changes in data

Putting it all together Suppose you want to measure the expected accuracy of your classifier, assess learning rate, and measure variance all at the same time? for (i=0;i<10;++i) { // variance reps shuffle data do 10-way CV partition of data for each train/test partition { // xval for (pct=0.1;pct+=0.1;pct<=0.9) { // LC Subsample pct fraction of training set train on subsample, test on test set } avg across all folds of CV partition generate learning curve for this partition } get mean and std across all curves

Putting it all together “hepatitis” data

5 minutes of math... Decision trees are non-metric Don’t know anything about relations between instances, except sets induced by feature splits Often, we have well-defined distances between points Idea of distance encapsulated by a metric

5 minutes of math... Definition: a metric function is a function that obeys the following properties: 1. Non-negativity: 2. Reflexivity: 3. Symmetry: 4. Triangle inequality:

5 minutes of math... Euclidean distance

5 minutes of math xaxa xbxb dE(xa,xb)dE(xa,xb)

5 minutes of math... Manhattan (taxicab) distance Distance travelled along a grid between two points No diagonals allowed Good for integer features

5 minutes of math xaxa xbxb dM(xa,xb)dM(xa,xb)

5 minutes of math... What if some attribute is categorical?

5 minutes of math... What if some attribute is categorical? Typical answer is Hamming (sometimes 0/1 ) distance: For each attribute, add 1 if the instances differ in that attribute, else 0

Distances in classification Nearest neighbor rule: find the nearest instance to the query point in feature space, return the class of that instance Simplest possible distance-based classifier With more notation:

Distances in classification Nearest neighbor rule: find the nearest instance to the query point in feature space, return the class of that instance Simplest possible distance-based classifier With more notation: Distance here is “whatever’s appropriate to your data”

Properties of NN Training time of NN? Classification time? Geometry of model? d(, ) Closer to

Properties of NN Training time of NN? Classification time? Geometry of model?

NN miscellaney Slight generalization: k -Nearest neighbors ( k -NN) Find k training instances closest to query point Vote among them for label Q : How does this affect system?

Steep learning curves Reading: DH&S, Ch 4.6, 4.5.

Similar presentations

Presentation on theme: "Steep learning curves Reading: DH&S, Ch 4.6, 4.5."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Steep learning curves Reading: DH&S, Ch 4.6, 4.5.

Similar presentations

Presentation on theme: "Steep learning curves Reading: DH&S, Ch 4.6, 4.5."— Presentation transcript:

Similar presentations

About project

Feedback