ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:

Slides:

Advertisements

Similar presentations

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Regression –Model selection, Cross-validation –Error decomposition.

Advertisements

Model assessment and cross-validation - overview

CMPUT 466/551 Principal Source: CMU

Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –SVM –Multi-class SVMs –Neural Networks –Multi-layer Perceptron Readings:

Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Classification and Prediction: Regression Analysis

Crash Course on Machine Learning

Crash Course on Machine Learning Part II

ECE 5984: Introduction to Machine Learning

CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 2012

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Naïve Bayes Readings: Barber

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.

CSE 446 Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Statistical Estimation (MLE, MAP, Bayesian) Readings: Barber 8.6, 8.7.

LOGISTIC REGRESSION David Kauchak CS451 – Fall 2013.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Nearest Neighbour Readings: Barber 14 (kNN)

Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression Regression Trees.

Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.

Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Supervised Learning –General Setup, learning from data –Nearest Neighbour.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Regression Readings: Barber 17.1, 17.2.

Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.

Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.

Machine Learning 5. Parametric Methods.

Linear Models (II) Rong Jin. Recap  Classification problems Inputs x  output y y is from a discrete set Example: height 1.8m  male/female?  Statistical.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Expectation Maximization –Principal Component Analysis (PCA) Readings:

Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.

Naïve Bayes Classification Recitation, 1/25/07 Jonathan Huang.

Naive Bayes (Generative Classifier) vs. Logistic Regression (Discriminative Classifier) Minkyoung Kim.

Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:

Matt Gormley Lecture 11 October 5, 2016

Evaluating Classifiers

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

Dhruv Batra Georgia Tech

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

Discriminative and Generative Classifiers

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

Bias and Variance of the Estimator

ECE 5424: Introduction to Machine Learning

Logistic Regression Classification Machine Learning.

ECE 5424: Introduction to Machine Learning

Mathematical Foundations of BME Reza Shadmehr

Pattern Recognition and Machine Learning

Support Vector Machine I

Model generalization Brief summary of methods

The Bias-Variance Trade-Off

Shih-Yang Su Virginia Tech

Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]

Presentation transcript:

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification: Naïve Bayes Readings: Barber 17.1, 17.2,

Administrativia HW2 –Due: Friday 03/06, 11:55pm –Implement linear regression, Naïve Bayes, Logistic Regression Need a couple of catch-up lectures –How about 4-6pm? (C) Dhruv Batra2

Administrativia Mid-term –When: March 18, class timing –Where: In class –Format: Pen-and-paper. –Open-book, open-notes, closed-internet. No sharing. –What to expect: mix of Multiple Choice or True/False questions “Prove this statement” “What would happen for this dataset?” –Material Everything from beginning to class to (including) SVMs (C) Dhruv Batra3

Recap of last time (C) Dhruv Batra4

Regression (C) Dhruv Batra5

6Slide Credit: Greg Shakhnarovich

(C) Dhruv Batra7Slide Credit: Greg Shakhnarovich

(C) Dhruv Batra8Slide Credit: Greg Shakhnarovich

What you need to know Linear Regression –Model –Least Squares Objective –Connections to Max Likelihood with Gaussian Conditional –Robust regression with Laplacian Likelihood –Ridge Regression with priors –Polynomial and General Additive Regression (C) Dhruv Batra9

Plan for Today (Finish) Model Selection –Overfitting vs Underfitting –Bias-Variance trade-off aka Modeling error vs Estimation error tradeoff Naïve Bayes (C) Dhruv Batra10

New Topic: Model Selection and Error Decomposition (C) Dhruv Batra11

Example for Regression Demo – How do we pick the hypothesis class? (C) Dhruv Batra12

Model Selection How do we pick the right model class? Similar questions –How do I pick magic hyper-parameters? –How do I do feature selection? (C) Dhruv Batra13

Errors Expected Loss/Error Training Loss/Error Validation Loss/Error Test Loss/Error Reporting Training Error (instead of Test) is CHEATING Optimizing parameters on Test Error is CHEATING (C) Dhruv Batra14

(C) Dhruv Batra15Slide Credit: Greg Shakhnarovich

(C) Dhruv Batra16Slide Credit: Greg Shakhnarovich

(C) Dhruv Batra17Slide Credit: Greg Shakhnarovich

(C) Dhruv Batra18Slide Credit: Greg Shakhnarovich

(C) Dhruv Batra19Slide Credit: Greg Shakhnarovich

Typical Behavior (C) Dhruv Batra20 a

Overfitting Overfitting: a learning algorithm overfits the training data if it outputs a solution w when there exists another solution w’ such that: 21(C) Dhruv BatraSlide Credit: Carlos Guestrin

Error Decomposition (C) Dhruv Batra22 Reality Modeling Error model class Estimation Error Optimization Error

Error Decomposition (C) Dhruv Batra23 Reality Estimation Error Optimization Error Modeling Error model class

Error Decomposition (C) Dhruv Batra24 Reality Modeling Error model class Estimation Error Optimization Error Higher-Order Potentials

Error Decomposition Approximation/Modeling Error –You approximated reality with model Estimation Error –You tried to learn model with finite data Optimization Error –You were lazy and couldn’t/didn’t optimize to completion (Next time) Bayes Error –Reality just sucks (C) Dhruv Batra25

Bias-Variance Tradeoff Bias: difference between what you expect to learn and truth –Measures how well you expect to represent true solution –Decreases with more complex model Variance: difference between what you expect to learn and what you learn from a from a particular dataset –Measures how sensitive learner is to specific dataset –Increases with more complex model (C) Dhruv Batra26Slide Credit: Carlos Guestrin

Bias-Variance Tradeoff Matlab demo (C) Dhruv Batra27

Bias-Variance Tradeoff Choice of hypothesis class introduces learning bias –More complex class → less bias –More complex class → more variance 28(C) Dhruv BatraSlide Credit: Carlos Guestrin

(C) Dhruv Batra29Slide Credit: Greg Shakhnarovich

Learning Curves Error vs size of dataset On board –High-bias curves –High-variance curves (C) Dhruv Batra30

Debugging Machine Learning My algorithm does work –High test error What should I do? –More training data –Smaller set of features –Larger set of features –Lower regularization –Higher regularization (C) Dhruv Batra31

What you need to know Generalization Error Decomposition –Approximation, estimation, optimization, bayes error –For squared losses, bias-variance tradeoff Errors –Difference between train & test error & expected error –Cross-validation (and cross-val error) –NEVER EVER learn on test data Overfitting vs Underfitting (C) Dhruv Batra32

New Topic: Naïve Bayes (your first probabilistic classifier) (C) Dhruv Batra33 Classificationxy Discrete

Classification Learn: h:X  Y –X – features –Y – target classes Suppose you know P(Y|X) exactly, how should you classify? –Bayes classifier: Why? Slide Credit: Carlos Guestrin

Optimal classification Theorem: Bayes classifier h Bayes is optimal! –That is Proof: Slide Credit: Carlos Guestrin

Generative vs. Discriminative Generative Approach –Estimate p(x|y) and p(y) –Use Bayes Rule to predict y Discriminative Approach –Estimate p(y|x) directly OR –Learn “discriminant” function h(x) (C) Dhruv Batra36

Generative vs. Discriminative Generative Approach –Assume some functional form for P(X|Y), P(Y) –Estimate p(X|Y) and p(Y) –Use Bayes Rule to calculate P(Y| X=x) –Indirect computation of P(Y|X) through Bayes rule –But, can generate a sample, P(X) =  y P(y) P(X|y) Discriminative Approach –Estimate p(y|x) directly OR –Learn “discriminant” function h(x) –Direct but cannot obtain a sample of the data, because P(X) is not available (C) Dhruv Batra37

Generative vs. Discriminative Generative: –Today: Naïve Bayes Discriminative: –Next: Logistic Regression NB & LR related to each other. (C) Dhruv Batra38

How hard is it to learn the optimal classifier? Slide Credit: Carlos Guestrin Categorical Data How do we represent these? How many parameters? –Class-Prior, P(Y): Suppose Y is composed of k classes –Likelihood, P(X|Y): Suppose X is composed of d binary features Complex model  High variance with limited data!!!

Independence to the rescue (C) Dhruv Batra40Slide Credit: Sam Roweis

The Naïve Bayes assumption Naïve Bayes assumption: –Features are independent given class: –More generally: How many parameters now? Suppose X is composed of d binary features Slide Credit: Carlos Guestrin(C) Dhruv Batra41

The Naïve Bayes Classifier Slide Credit: Carlos Guestrin(C) Dhruv Batra42 Given: –Class-Prior P(Y) –d conditionally independent features X given the class Y –For each X i, we have likelihood P(X i |Y) Decision rule: If assumption holds, NB is optimal classifier!