CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis

Slides:

Advertisements

Similar presentations

CHAPTER 13: Alpaydin: Kernel Machines

Advertisements

Generative Models Thus far we have essentially considered techniques that perform classification indirectly by modeling the training data, optimizing.

INTRODUCTION TO Machine Learning 2nd Edition

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

CHAPTER 10: Linear Discrimination

Support Vector Machines

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning 3rd Edition

Supervised Learning Recap

Lecture 13 – Perceptrons Machine Learning March 16, 2010.

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Lecture 14 – Neural Networks

Support Vector Machines (and Kernel Methods in general)

Support Vector Machines and Kernel Methods

Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.

SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.

Lecture 10: Support Vector Machines

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.

Outline Separating Hyperplanes – Separable Case

Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

Linear Discrimination Reading: Chapter 2 of textbook.

Linear Models for Classification

Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.

Support-Vector Networks C Cortes and V Vapnik (Tue) Computational Models of Intelligence Joon Shik Kim.

Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

SVMs in a Nutshell.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.

Machine Learning 12. Local Models.

Neural networks and support vector machines

PREDICT 422: Practical Machine Learning

ECE 5424: Introduction to Machine Learning

Geometrical intuition behind the dual problem

LECTURE 16: SUPPORT VECTOR MACHINES

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Jan Rupnik Jozef Stefan Institute

An Introduction to Support Vector Machines

LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS

Support Vector Machines

Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis

Linear machines 28/02/2017.

Mathematical Foundations of BME Reza Shadmehr

INTRODUCTION TO Machine Learning

LECTURE 17: SUPPORT VECTOR MACHINES

INTRODUCTION TO Machine Learning

Support Vector Machines

Machine Learning Perceptron: Linearly Separable Supervised Learning

Support vector machines

Linear Discrimination

Review for test #3 Radial basis functions SVM SOM.

CISC 841 Bioinformatics (Fall 2007) Kernel Based Methods (I)

SVMs for Document Ranking

Introduction to Machine Learning

Presentation transcript:

CSCE833 Machine Learning Lecture 9 Linear Discriminant Analysis Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 University of South Carolina Department of Computer Science and Engineering

Likelihood- vs. Discriminant-based Classification Likelihood-based: Assume a model for p(x|Ci), use Bayes’ rule to calculate P(Ci|x) gi(x) = log P(Ci|x) Discriminant-based: Assume a model for gi(x|Φi); no density estimation Estimating the boundaries is enough; no need to accurately estimate the densities inside the boundaries Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Example: Boundary or Class Model Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Linear Discriminant Linear discriminant: Advantages: Simple: O(d) space/computation Knowledge extraction: Weighted sum of attributes; positive/negative weights, magnitudes (credit scoring) Optimal when p(x|Ci) are Gaussian with shared cov matrix; useful when classes are (almost) linearly separable Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Generalized Linear Model Quadratic discriminant: Higher-order (product) terms: Map from x to z using nonlinear basis functions and use a linear discriminant in z-space Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Two Classes Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Geometry Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Multiple Classes Classes are linearly separable Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Pairwise Separation Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

From Discriminants to Posteriors When p (x | Ci ) ~ N ( μi , ∑) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Sigmoid (Logistic) Function Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Gradient-Descent E(w|X) is error with parameters w on sample X w*=arg minw E(w | X) Gradient Gradient-descent: Starts from random w and updates w iteratively in the negative direction of gradient Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Gradient-Descent η E (wt) E (wt+1) wt wt+1 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Logistic Discrimination Two classes: Assume log likelihood ratio is linear Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Training: Two Classes Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Training: Gradient-Descent Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

100 1000 10 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

K>2 Classes softmax Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Example Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Generalizing the Linear Model Quadratic: Sum of basis functions: where φ(x) are basis functions Kernels in SVM Hidden units in neural networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Discrimination by Regression Classes are NOT mutually exclusive and exhaustive Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Optimal Separating Hyperplane (Cortes and Vapnik, 1995; Vapnik, 1995) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Margin Distance from the discriminant to the closest instances on either side Distance of x to the hyperplane is We require For a unique sol’n, fix ρ||w||=1and to max margin Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Most αt are 0 and only a small number have αt >0; they are the support vectors Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Soft Margin Hyperplane Not linearly separable Soft error New primal is Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Kernel Machines Preprocess input x by basis functions z = φ(x) g(z)=wTz g(x)=wT φ(x) The SVM solution Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Kernel Functions Polynomials of degree q: Radial-basis functions: Sigmoidal functions: (Cherkassky and Mulier, 1998) Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

SVM for Regression Use a linear model (possibly kernelized) f(x)=wTx+w0 Use the є-sensitive error function Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)