Discriminative and Generative Classifiers

Slides:

Advertisements

Similar presentations

Pattern Classification & Decision Theory. How are we doing on the pass sequence? Bayesian regression and estimation enables us to track the man in the.

Advertisements

G53MLE | Machine Learning | Dr Guoping Qiu

Parameter Learning in MN. Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited.

Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)

Naïve Bayes Classifier

Classification and risk prediction

Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.

Sample Selection Bias Lei Tang Feb. 20th, Classical ML vs. Reality  Training data and Test data share the same distribution (In classical Machine.

Linear Methods for Classification

Decision Theory Naïve Bayes ROC Curves

Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.

Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,

1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Logistic Regression 10701/15781 Recitation February 5, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew.

1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Crash Course on Machine Learning Part II

CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 2012

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.

CSE 446 Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

1 E. Fatemizadeh Statistical Pattern Recognition.

Non-Bayes classifiers. Linear discriminants, neural networks.

INTRODUCTION TO Machine Learning 3rd Edition

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.

Linear Methods for Classification : Presentation for MA seminar in statistics Eli Dahan.

29 August 2013 Venkat Naïve Bayesian on CDF Pair Scores.

Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.

Logistic Regression William Cohen.

CSE 446 Logistic Regression Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.

Linear Classifier Team teaching.

Naive Bayes Collaborative Filtering ICS 77B Max Welling.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Naive Bayes (Generative Classifier) vs. Logistic Regression (Discriminative Classifier) Minkyoung Kim.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:

Matt Gormley Lecture 4 September 12, 2016

Machine Learning – Classification David Fenyő

Chapter 3: Maximum-Likelihood Parameter Estimation

ECE 5424: Introduction to Machine Learning

Generative verses discriminative classifier

Empirical risk minimization

Bounding the error of misclassification

Perceptrons Lirong Xia.

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning

Lecture 04: Logistic Regression

ECE 5424: Introduction to Machine Learning

Overview of Supervised Learning

with observed random variables

Computational Learning Theory

Mathematical Foundations of BME Reza Shadmehr

دسته بندی با استفاده از مدل های خطی

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Pattern Recognition and Image Analysis

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

Computational Learning Theory

The basics of Bayes decision theory

Pattern Recognition and Machine Learning

Lecture 04: Logistic Regression

Empirical risk minimization

Recap: Naïve Bayes classifier

Linear Discrimination

Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]

Derek Hoiem CS 598, Spring 2009 Jan 27, 2009

Perceptrons Lirong Xia.

Presentation transcript:

Discriminative and Generative Classifiers Tom Mitchell Statistical Approaches to Learning and Discovery, 10-702 and 15-802 March 19, 2003 Lecture based on “On Discriminative vs. Generative classifiers: A comparison of logistic regression and naïve Bayes,” A. Ng and M. Jordan, NIPS 2002.

Lecture Outline Generative and Discriminative classifiers Asymptotic comparison (as # examples grows) when model correct when model incorrect Non-asymptotic analysis convergence of parameter estimates convergence of expected error Experimental results

Generative vs. Discriminative Classifiers Training classifiers involves estimating f: X  Y, or P(Y|X) Discriminative classifiers (also called ‘informative’ by Rubinstein&Hastie): Assume some functional form for P(Y|X) Estimate parameters of P(Y|X) directly from training data Generative classifiers Assume some functional form for P(X|Y), P(X) Estimate parameters of P(X|Y), P(X) directly from training data Use Bayes rule to calculate P(Y|X= xi)

Generative-Discriminative Pairs Example: assume Y boolean, X = <x1, x2, …, xn>, where xi are boolean, perhaps dependent on Y, conditionally independent given Y Generative model: naïve Bayes: Classify new example x based on ratio Equivalently, based on sign of log of this ratio s indicates size of set. l is smoothing parameter

Generative-Discriminative Pairs Example: assume Y boolean, X = <x1, x2, …, xn>, where xi are boolean, perhaps dependent on Y, conditionally independent given Y Generative model: naïve Bayes: Classify new example x based on ratio Discriminative model: logistic regression Note both learn linear decision surface over X in this case

What is the difference asymptotically? Notation: let denote error of hypothesis learned via algorithm A, from m examples If assumed model correct (e.g., naïve Bayes model), and finite number of parameters, then If assumed model incorrect Note assumed discriminative model can be correct even when generative model incorrect, but not vice versa

Rate of covergence: logistic regression Let hDis,m be logistic regression trained on m examples in n dimensions. Then with high probability: Implication: if we want for some constant , it suffices to pick  Convergences to best linear classifier, in order of n examples (result follows from Vapnik’s structural risk bound, plus fact that VCDim of n dimensional linear separators is n )

Rate of covergence: naïve Bayes Consider first how quickly parameter estimates converge toward their asymptotic values. Then we’ll ask how this influences rate of convergence toward asymptotic classification error.

Rate of covergence: naïve Bayes parameters

Rate of covergence: naïve Bayes classification error See blackboard 

Some experiments from UCI data sets

Pairs of plots comparing naïve Bayes and logistic regression with quadratic regularization penalty. Left plots show training error vs. number of examples, right plots show test error. Each row uses different regularization penalty. Top row uses small penalty; penalty increases as you move down the page. Thanks to John Lafferty.