LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning.

LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning

Announcements / Questions Life Sciences and Technology Fair is tomorrow: 3:30-6pm in the Carroll Room www.smith.edu/lazaruscenter/fairs_scitech.php Office hours: does anyone have a conflict?

Outline Evaluating Models Lab pt. 1 – Introduction to R: - Basic Commands - Graphics Overview - Indexing Data - Loading Data - Additional Graphical/Numerical Summaries Lab pt. 2 - Exploring other datasets (time permitting)

Beyond LR Stated goal of this course: explore methods that go beyond standard linear regression

One tool to rule them all…? Question: why not just teach you the best one first?

Answer: it depends No single method dominates over all On a particular data set, for a particular question, one specific method may work well; on a related but not identical dataset or question, another might be better Choosing the right approach is arguably the most challenging aspect of doing statistics in practice So how do we do it?

Measuring “Quality of Fit” One question we might ask: how well do my model’s predictions actually match the observations? What we need: a way to measure how close the predicted response is to the true response Flashback to your stats training: what do we use in regression?

Mean Squared Error True response for the i th observation Prediction our model gives for the i th observation We take the average over all observations Of the squared difference

“Training” MSE This version of MSE is computed using the training data that was used to fit the model Reality check: is this what we care about?

Test MSE Better plan: see how well the model does on observations we didn’t train on Given some never-before-seen examples, we can just calculate the MSE on those using the same method But what if we don’t have any new observations to test? - Can we just use the training MSE? - Why or why not?

Example Avg. training MSE Test MSE

Training vs. Test MSE As the flexibility of the statistical learning method increases, we observe: - a monotone decrease in the training MSE - a U-shape in the test MSE Fun fact: occurs regardless of the data set and statistical method being used As flexibility increases, training MSE will decrease, but the test MSE may not Overfitting

Trade-off between bias and variance The U-shaped curve in the Test MSE is the result of two competing properties: bias and variance Variance refers to the amount by which the model would change if we estimated it using different training data Bias refers to the error that is introduced by approximating a real-life problem (which may be extremely complicated) using a much simpler model

Relationship between bias and variance In general, more flexible methods have higher variance

Relationship between bias and variance In general, more flexible methods have lower bias

Trade-off between bias and variance It is possible to show that the expected test MSE for a given value can be decomposed into three terms: The bias of our model on the test value The variance of our model on the test value The variance of the error terms

Balancing bias and variance We know variance and squared bias are always nonnegative (why?) There’s nothing we can do about the variance of the irreducible error inherent in the model So we’re looking for a method that minimizes the sum of the first two terms… which are (in some sense) competing

Balancing bias and variance It’s easy to build a model with low variance but high bias (how?) Just as easy to build one with low bias but high variance (how?) The challenge: finding a method for which both the variance and the squared bias are low This trade-off is one of the most important recurring themes in this course

What about classification? So far, we’ve only talked about how to evaluate the accuracy of a regression model The idea of a bias-variance trade-off also translates to the classification setting, but we need some minor modifications to deal with qualitative responses For example: we can’t really compute MSE without numerical values, so what can we do instead?

We tally up all the times Where the model’s classification was different from the actual class Training error rate One common approach is to use the training error rate, where we measure the proportion of the times our model incorrectly classifies a training data point: Using an indicator function And take the average

Takeaways Choosing the “right” level of flexibility is critical for success in both the regression and classification settings The bias-variance tradeoff can make this a difficult task In Chapter 5, we’ll return to this topic and explore various methods for estimating test error rates We’ll then use these estimates to find the optimal level of flexibility for a given ML method

Questions?

Lab pt. 1: Introduction to R Basic Commands Graphics Indexing data Loading external data Generating summaries Playing with real data (time permitting!)

Lab pt. 1: Introduction to R

Today’s walkthrough (and likely many others) will be run using: which allows me to build “notebooks” which run live R code (python, too!) in the browser Hint: this is also a nice way to format your homework!

Lab pt. 2: Exploring Other Datasets More datasets from the book - ISLR package - Already installed on Smith Rstudio server  Working locally? > install.packages(‘ISLR’) - Details available at: cran.r-project.org/web/packages/ISLRcran.r-project.org/web/packages/ISLR - Dataset descriptions: www.inside-r.org/packages/cran/ISLR/docswww.inside-r.org/packages/cran/ISLR/docs Real world data: - Olympic Athletes: goo.gl/1aUnJWgoo.gl/1aUnJW - World Bank Indicators: goo.gl/0QdN9Ugoo.gl/0QdN9U - Airplane Bird Strikes: goo.gl/lFl5ldgoo.gl/lFl5ld - …and a whole bunch more: goo.gl/kcbqfcgoo.gl/kcbqfc

Coming Up Next class: Linear Regression 1: Simple and Multiple LR For planning purposes: Assignment 1 will be posted next week, and will be due the following Weds (Feb. 10 th )

LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning.

Similar presentations

Presentation on theme: "LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning.

Similar presentations

Presentation on theme: "LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning."— Presentation transcript:

Similar presentations

About project

Feedback