ETHEM ALPAYDIN © The MIT Press, Lecture Slides for
Likelihood- vs. Discriminant-based Classification Likelihood-based: Assume a model for p(x|C i ), use Bayes’ rule to calculate P(C i |x) g i (x) = log P(C i |x) Discriminant-based: Assume a model for g i (x|Φ i ); no density estimation Estimating the boundaries is enough; no need to accurately estimate the densities inside the boundaries 3Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Linear discriminant: Advantages: Simple: O(d) space/computation Knowledge extraction: Weighted sum of attributes; positive/negative weights, magnitudes (credit scoring) Optimal when p(x|C i ) are Gaussian with shared cov matrix; useful when classes are (almost) linearly separable Linear Discriminant 4Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Quadratic discriminant: Higher-order (product) terms: Map from x to z using nonlinear basis functions and use a linear discriminant in z-space Generalized Linear Model 5Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Two Classes 6Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Geometry 7Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Multiple Classes 8 Classes are linearly separable Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Pairwise Separation 9 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
When p (x | C i ) ~ N ( μ i, ∑ ) From Discriminants to Posteriors 10Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
11 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Sigmoid (Logistic) Function 12Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
E(w|X) is error with parameters w on sample X w*=arg min w E(w | X) Gradient Gradient-descent: Starts from random w and updates w iteratively in the negative direction of gradient Gradient-Descent 13Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Gradient-Descent 14 wtwt w t+1 η E (w t ) E (w t+1 ) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Logistic Discrimination 15 Two classes: Assume log likelihood ratio is linear Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Training: Two Classes 16Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Training: Gradient-Descent 17Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
18Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
K>2 Classes 20 softmax Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
21Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Example 22Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Generalizing the Linear Model 23 Quadratic: Sum of basis functions: where φ(x) are basis functions Hidden units in neural networks (Chapters 11 and 12) Kernels in SVM (Chapter 13) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
Discrimination by Regression 24 Classes are NOT mutually exclusive and exhaustive Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)