Linear Methods for Classification 20.04.2015: Presentation for MA seminar in statistics Eli Dahan.

Slides:

Advertisements

Similar presentations

Bayes rule, priors and maximum a posteriori

Advertisements

Lecture 3. Linear Models for Classification

Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Pattern Recognition and Machine Learning

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

Lecture 8,9 – Linear Methods for Classification Rice ELEC 697 Farinaz Koushanfar Fall 2006.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning 3rd Edition

Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.

x – independent variable (input)

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Linear Methods for Classification

Machine Learning CMPT 726 Simon Fraser University

Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.

Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,

1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

Outline Separating Hyperplanes – Separable Case

Principles of Pattern Recognition

ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:

Classification, Part 2 BMTRY 726 4/11/14. The 3 Approaches (1) Discriminant functions: -Find function f(x) that maps each point x directly into class.

Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.

1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.

Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.

LOGISTIC REGRESSION David Kauchak CS451 – Fall 2013.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

CS Statistical Machine learning Lecture 10 Yuan (Alan) Qi Purdue CS Sept

CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.

Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.

Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.

ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.

INTRODUCTION TO Machine Learning 3rd Edition

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.

Linear Models for Classification

Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.

Machine Learning 5. Parametric Methods.

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.

Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

LECTURE 07: CLASSIFICATION PT. 3 February 15, 2016 SDS 293 Machine Learning.

LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning.

ECE 471/571 – Lecture 3 Discriminant Function and Normal Density 08/27/15.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Introduction to Classifiers Fujinaga. Bayes (optimal) Classifier (1) A priori probabilities: and Decision rule: given and decide if and probability of.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

PREDICT 422: Practical Machine Learning Module 6: Classification Models Lecturer: Nathan Bastian, Section: XXX.

Linear Methods for Classification, Part 1

Probability Theory and Parameter Estimation I

Ch3: Model Building through Regression

CH 5: Multivariate Methods

Classification Discriminant Analysis

Classification Discriminant Analysis

Course Outline MODEL INFORMATION COMPLETE INCOMPLETE

Mathematical Foundations of BME Reza Shadmehr

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Pattern Recognition and Machine Learning

Generally Discriminant Analysis

LECTURE 07: BAYESIAN ESTIMATION

CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis

Parametric Methods Berlin Chen, 2005 References:

Linear Discrimination

Mathematical Foundations of BME Reza Shadmehr

Hairong Qi, Gonzalez Family Professor

Presentation transcript:

Linear Methods for Classification : Presentation for MA seminar in statistics Eli Dahan

Outline Introduction - problem and solution LDA - Linear Discriminant Analysis LR : Logistic Regression (Linear Regression) LDA Vs. LR In a word – Separating Hyperplanes

Introduction- the problem X Group k Observation Or Group l? *We can think of G as “group label” Posteriori Pj=P(G=j|X=x)

Introduction- the solution Linear Decision boundary: p k =p l p k >p l  choose K p l >p k  choose L

Linear Discriminant Analysis Let P(G = k) =  k and P(X=x|G=k) = f k (x) Then by bayes rule: Decision boundary:

Linear Discriminant Analysis Assuming f k (x) ~ gauss(  k,  k ) and  1 =  2 = … =  K =  We get Linear (in x) decision boundary For not common  we get QDA (RDA)

Linear Discriminant Analysis Using empirical estimation methods: Top classifier (Michie et al., 1994) – the data supports linear boundaries, stability

Logistic Regression Models posterior prob. Of K classes; they sum to one and remain in [0,1]: Linear Decision boundary:

Logistic Regression Model fit: In max. ML Newton-Raphson algorithm is used

Linear Regression Recall the common features of multivariate regression: +Lack of multicollinearity etc. Here: Assuming N instances (N*p observation matrix X), Y is a N*K indicator response matrix (K classes).

Linear Regression

LDA Vs. LR Similar results, LDA slightly better (56% vs. 67% error rate for LR) Presumably, they are identical because of the linear end-form of decision boundaries (return to see).

LDA Vs. LR LDA: parameters fit by max. full log-likelihood based on the joint density which assumes Gaussian density (Efron 1975 – worst case of ignoring gaussianity 30% eff. reduction) Linearity is derived LR: P(X) arbitrary (advantage in model selection and abitility to absorb extreme X values), fits parameters of P(G|X) by maximizing the conditional likelihood. Linearity is assumed

In a word – separating hyperplanes