Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,

Slides:

Advertisements

Similar presentations

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

Advertisements

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

Parameter Learning in MN. Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited.

Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)

Naïve Bayes Classifier

On Discriminative vs. Generative classifiers: Naïve Bayes

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Classification and risk prediction

Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.

Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

Today Linear Regression Logistic Regression Bayesians v. Frequentists

Decision Theory Naïve Bayes ROC Curves

Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(

Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.

CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Logistic Regression 10701/15781 Recitation February 5, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew.

Review of Lecture Two Linear Regression Normal Equation

Crash Course on Machine Learning

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 2012

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Naïve Bayes Readings: Barber

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.

CSE 446 Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

Generative verses discriminative classifier

Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.

Machine Learning Recitation 6 Sep 30, 2009 Oznur Tastan.

Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI

MLE’s, Bayesian Classifiers and Naïve Bayes Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 30,

Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.

Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.

INTRODUCTION TO Machine Learning 3rd Edition

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.

Lecture 2: Statistical learning primer for biologists

Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten’s and E. Frank’s “Data Mining” and Jeremy Wyatt and others.

Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.

Logistic Regression William Cohen.

Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.

Machine Learning 5. Parametric Methods.

CSE 446 Logistic Regression Perceptron Learning Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

Linear Models (II) Rong Jin. Recap  Classification problems Inputs x  output y y is from a discrete set Example: height 1.8m  male/female?  Statistical.

Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Naive Bayes Collaborative Filtering ICS 77B Max Welling.

Naïve Bayes Classification Recitation, 1/25/07 Jonathan Huang.

Naive Bayes (Generative Classifier) vs. Logistic Regression (Discriminative Classifier) Minkyoung Kim.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:

Chapter 3: Maximum-Likelihood Parameter Estimation

CH 5: Multivariate Methods

Perceptrons Lirong Xia.

ECE 5424: Introduction to Machine Learning

with observed random variables

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]

Perceptrons Lirong Xia.

Presentation transcript:

Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials.

Classification and Regression  Classification  Goal: Learn the underlying function f: X (features)  Y (class, or category) e.g. words  “spam”, or “not spam”  Regression f: X (features)  Y (continuous values) e.g. GPA  salary

Supervised Classification  How to find an unknown function f: X  Y (features  class) or equivalently P(Y|X)  Classifier: 1. Find P(X|Y), P(Y), and use Bayes rule - generative 2. Find P(Y|X) directly - discriminative

Classification Learn P(Y|X) 1. Bayes rule: P(Y|X) = P(X|Y)P(Y) / P(X) ~ P(X|Y)P(Y)  Learn P(X|Y), P(Y)  “Generative” classifier 2. Learn P(Y|X) directly  “ Discriminative ” (to be covered later in class)  e.g. logistic regression

Generative Classifier: Bayes Classifier Learn P(X|Y), P(Y)  e.g. classification problem  3 classes for Y = { spam, not spam, maybe }  10,000 binary features for X = {“Cash”, “Rolex”,…}  How many parameters do we have?  P(Y) :  P(X|Y) :

Generative learning: Naïve Bayes  Introduce conditional independence P(X 1,X 2 |Y) = P(X 1 |Y) P(X 2 |Y) P(Y|X) = P(X|Y) P(Y) / P(X) for X=(X i,…,X n ) = P(X 1 |Y)…P(X n |Y) P(Y) / P(X) = prod i P(X i |Y) P(Y) / P(X)  Learn P(X 1 |Y), … P(X n |Y), P(Y) instead of learning P(X 1,…, X n |Y) directly

Naïve Bayes  3 classes for Y = {spam, not spam, maybe}  10,000 binary features for X = {“Cash”,”Rolex”,…}  Now, how many parameters?  P(Y)  P(X|Y) fewer parameters “simpler” – less likely to overfit

Full Bayes vs. Naïve Bayes  XOR X1X2Y P(Y=1|(X1,X2)=(0,1))=?  Full Bayes: P(Y=1)=? P((X1,X2)=(0,1)|Y=1)=?  Naïve Bayes: P(Y=1)=? P((X1,X2)=(0,1)|Y=1)=?

Regression  Prediction of continuous variables  e.g. I want to predict salaries from GPA.   I can regress that …  Learn the mapping f: X  Y  Model is linear in the parameters (+ some noise)  linear regression  Assume Gaussian noise  Learn MLE Θ

1-parameter linear regression  Normal linear regression or equivalently,  MLE Θ ?  MLE σ 2 ?

Multivariate linear regression  What if the inputs are vectors?  Write matrix X and Y : (n data points, k features for each data)  MLE Θ =

Constant term?  We may expect linear data that does not go through the origin  Trick?

The constant term

Regression: another example  Assume the following model to fit the data. The model has one unknown parameter θ to be learned from data.  A maximum likelihood estimation of θ?