Download presentation
Presentation is loading. Please wait.
Published byRosamond Holt Modified over 8 years ago
1
CS 2750: Machine Learning Linear Models for Classification Prof. Adriana Kovashka University of Pittsburgh February 15, 2016
2
Plan for Today Regression for classification Fisher’s linear discriminant Perceptron Logistic regression Multi-way classification Generative vs discriminative models
7
The effect of outliers Figures from Bishop Magenta = least squares, green = logistic regression
8
With three classes Figures from Bishop Left = least squares, right = logistic regression
15
Fisher’s linear discriminant Figures from Bishop
16
Plan for Today Regression for classification Fisher’s linear discriminant Perceptron Logistic regression Multi-way classification Generative vs discriminative models
17
The perceptron algorithm Rosenblatt (1962) Prediction rule: where Loss: (just using the Misclassified examples)
18
The perceptron algorithm Loss: Learning algorithm update rule: Interpretation: – If sample is being misclassified, make the weight vector more like it
19
The perceptron algorithm Figures from Bishop
20
Plan for Today Regression for classification Fisher’s linear discriminant Perceptron Logistic regression Multi-way classification Generative vs discriminative models
31
Maximum Likelihood Estimation (MLE) u We have a probabilistic model, M, of some phenomena, but we do not know its parameters, . Each “execution” of M produces an observation, x[i], according to the (unknown) distribution induced by M. Goal: After observing x[1],…, x[n], estimate the model parameters, , that generated the observed data. u This vector parameter can be used to predict future data. u MLE Principle: Choose parameters that maximize the likelihood of the data Adapted from Nir Friedman
32
Maximum Likelihood Estimation (MLE) The likelihood of the observed data, given the model parameters , is the conditional probability that the model, M, with parameters , produces x[1],…, x[n]. L( )=Pr( x[1],…, x[n] | , M), u In MLE we seek the model parameters, , that maximize the likelihood. Adapted from Nir Friedman
33
Example: Binomial Experiment u When tossed, it can land in one of two positions: Head (H) or Tail (T) We denote by the (unknown) probability P(H). Estimation task: Given a sequence of toss samples x[1], x[2], …, x[M] we want to estimate the probabilities P(H)= and P(T) = 1 - Adapted from Nir Friedman
34
The Likelihood Function u How good is a particular ? It depends on how likely it is to generate the observed data u The likelihood for the sequence H,T, T, H, H is u Taking derivative and equating it to 0, we get Adapted from Nir Friedman
35
Issues u Overconfidence u Better: Maximum-a-posteriori (MAP)
36
Plan for Today Regression for classification Fisher’s linear discriminant Perceptron Logistic regression Multi-way classification Generative vs discriminative models
37
Multi-class problems Instead of just two classes, we now have C classes E.g. predict which movie genre a viewer likes best Possible answers: action, drama, indie, thriller, etc. Two approaches: – One-vs-all – One-vs-one
38
Multi-class problems One-vs-all (a.k.a. one-vs-others) – Train K classifiers – In each, pos = data from class i, neg = data from classes other than i – The class with the most confident prediction wins – Example: You have 4 classes, train 4 classifiers 1 vs others: score 3.5 2 vs others: score 6.2 3 vs others: score 1.4 4 vs other: score 5.5 Final prediction: class 2 – Issues?
39
Multi-class problems One-vs-one (a.k.a. all-vs-all) – Train K(K-1)/2 binary classifiers (all pairs of classes) – They all vote for the label – Example: You have 4 classes, then train 6 classifiers 1 vs 2, 1 vs 3, 1 vs 4, 2 vs 3, 2 vs 4, 3 vs 4 Votes: 1, 1, 4, 2, 4, 4 Final prediction is class 4
40
Multi-class problems What are some problems with this approach to doing multi-class? – There are “natively multi-class” methods Figures from Bishop
41
Plan for Today Regression for classification Fisher’s linear discriminant Perceptron Logistic regression Multi-way classification Generative vs discriminative models
42
Generative models Binary case:
43
Generative models Mutli-class case:
44
Generative models Why are these called generative? Can use them to generate new samples x Perhaps this is overkill?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.