ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Regression –Model selection, Cross-validation –Error decomposition –Bias-Variance Tradeoff Readings: Barber 17.1, 17.2
Administrativia HW1 –Solutions available Project Proposal –Due: Tue 02/24, 11:55 pm –<=2pages, NIPS format –Show Igor’s proposal HW2 –Due: Friday 03/06, 11:55pm –Implement linear regression, Naïve Bayes, Logistic Regression (C) Dhruv Batra2
Recap of last time (C) Dhruv Batra3
Regression (C) Dhruv Batra4
5Slide Credit: Greg Shakhnarovich
(C) Dhruv Batra6Slide Credit: Greg Shakhnarovich
(C) Dhruv Batra7Slide Credit: Greg Shakhnarovich
(C) Dhruv Batra8Slide Credit: Greg Shakhnarovich
Why sum squared error??? Gaussians, Watson, Gaussians… But, why? 9(C) Dhruv Batra
10Slide Credit: Greg Shakhnarovich
Is OLS Robust? Demo – Bad things happen when the data does not come from your model! How do we fix this? (C) Dhruv Batra11
Robust Linear Regression y ~ Lap(w’x, b) On paper (C) Dhruv Batra12
Plan for Today (Finish) Regression –Bayesian Regression –Different prior vs likelihood combination –Polynomial Regression Error Decomposition –Bias-Variance –Cross-validation (C) Dhruv Batra13
Robustify via Prior Ridge Regression y ~ N(w’x, σ 2 ) w ~ N(0, t 2 I) P(w | x,y) = (C) Dhruv Batra14
Summary LikelihoodPriorName GaussianUniformLeast Squares Gaussian Ridge Regression GaussianLaplaceLasso LaplaceUniformRobust Regression StudentUniformRobust Regression (C) Dhruv Batra15
(C) Dhruv Batra16Slide Credit: Greg Shakhnarovich
(C) Dhruv Batra17Slide Credit: Greg Shakhnarovich
(C) Dhruv Batra18Slide Credit: Greg Shakhnarovich
(C) Dhruv Batra19Slide Credit: Greg Shakhnarovich
Example Demo – (C) Dhruv Batra20
What you need to know Linear Regression –Model –Least Squares Objective –Connections to Max Likelihood with Gaussian Conditional –Robust regression with Laplacian Likelihood –Ridge Regression with priors –Polynomial and General Additive Regression (C) Dhruv Batra21
New Topic: Model Selection and Error Decomposition (C) Dhruv Batra22
Example for Regression Demo – How do we pick the hypothesis class? (C) Dhruv Batra23
Model Selection How do we pick the right model class? Similar questions –How do I pick magic hyper-parameters? –How do I do feature selection? (C) Dhruv Batra24
Errors Expected Loss/Error Training Loss/Error Validation Loss/Error Test Loss/Error Reporting Training Error (instead of Test) is CHEATING Optimizing parameters on Test Error is CHEATING (C) Dhruv Batra25
(C) Dhruv Batra26Slide Credit: Greg Shakhnarovich
(C) Dhruv Batra27Slide Credit: Greg Shakhnarovich
(C) Dhruv Batra28Slide Credit: Greg Shakhnarovich
(C) Dhruv Batra29Slide Credit: Greg Shakhnarovich
(C) Dhruv Batra30Slide Credit: Greg Shakhnarovich
Typical Behavior (C) Dhruv Batra31 a
Overfitting Overfitting: a learning algorithm overfits the training data if it outputs a solution w when there exists another solution w’ such that: 32(C) Dhruv BatraSlide Credit: Carlos Guestrin
Error Decomposition (C) Dhruv Batra33 Reality Modeling Error model class Estimation Error Optimization Error
Error Decomposition (C) Dhruv Batra34 Reality Estimation Error Optimization Error Modeling Error model class
Error Decomposition (C) Dhruv Batra35 Reality Modeling Error model class Estimation Error Optimization Error Higher-Order Potentials
Error Decomposition Approximation/Modeling Error –You approximated reality with model Estimation Error –You tried to learn model with finite data Optimization Error –You were lazy and couldn’t/didn’t optimize to completion (Next time) Bayes Error –Reality just sucks (C) Dhruv Batra36