Download presentation
Linear Models for Classification
Berkay Topçu
Linear Models for Classification
Goal: Take an input vector and assign it to one of K classes (Ck where k=1,...,K) Linear separation of classes
Generalized Linear Models
We wish to predict discrete class labels, or more generally class posterior probabilities that lies in range (0,1). Classification model as a linear function of the parameters, Classification directly in the original input space , or a fixed nonlinear transformation of the input variables using a vector of basis functions
Discriminant Functions
Linear discriminants If , assign to class C1 and to class C2 otherwise Decision boundary is given by determines the orientation of the decision surface and determines the location Compact notation:
Multiple Classes K-class discriminant by combining number of two-class discriminant functions (K>2) One-versus-the-rest: seperating points in one particular class Ck from points not in that class One-versus-one: K(K-1)/2 binary discriminant functions
Multiple Classes A single K-class discriminant comprising K linear functions Assign to class Ck if for all How to learn the parameters of linear discriminant functions?
Least Squares for Classification
Each class Ck is described by its own linear model Training data set for n =1,...,N where Matrix whose nth row is the vector and whose nth row is
Least Squares for Classification
Minimizing the sum-of-squares error function Solution : Discriminant function :
Fisher’s Linear Discriminant
Dimensionality reduction: take the D-dimensional input vector and project to one dimension using Projection that maximizes class seperation Two-class problem: N1 points of C1 and N2 points of C2 Fisher’s idea: large separation between the projected class means small variance within each class, minimizing class overlap
Fisher’s Linear Discriminant
The Fisher criterion:
Fisher’s Linear Discriminant
For the two-class problem, Fisher criterion is a special case of least squares (reference : Penalized Discriminant Analysis – Hastie, Buja and Tibshirani) For multiple classes: The weights values are determined by the eigenvectors that corresponds to K highest eigenvalues of
The Perceptron Algorithm
Input vector is transformed using a nonlinear transformation Perceptron criterion: For all training samples We need to minimize
The Perceptron Algorithm – Stocastic Gradient Descent
Cycle through the training patterns in turn If the pattern is correctly classified weight vectors remains unchanged, else:
Probabilistic Generative Models
Depend on simple assumptions about the distribution of the data Logistic sigmoid function Maps the whole real axis to a finite interval
Continuous Inputs - Gaussian
Assuming the class-conditional densities are Gaussian Case of two classes
Maximum Likelihood Solution
Likelihood function: Maximizing log-likelihood
Probabilistic Discriminative Models
Probabilistic generative model Number of parameters grows quadratically with M (# dim.) However has M adjustable parameters Maximum likelihood solution for Logistic Regression Energy function: negative log likelihood
Iterative Reweighted Least Squares
Newton-Raphson iterative optimization on linear regression Same as the standard least-squares solution
Iterative Reweighted Least Squares
Newton-Raphson update for negative log likelihood Weighted least-squares problem
Maximum Margin Classifiers
Support Vector Machines for two-class problem Assuming linearly seperable data set There exists at least one set of variables satisfies That give the smallest generalization error Margin: the smallest distance between decision boundary and any of the samples
Support Vector Machines
Optimization of parameters, maximizing the margin Maximizing the margin minimizing : subject to the constraint: Introduction of Lagrange multipliers
Support Vector Machines - Lagrange Multipliers
Minimizing with respect to w and b and maximizing with respect to a. The dual form: Quadratic programming problem:
Support Vector Machines
Overlapping class distributions (linearly unseparable data) Slack variable: distance from the boundary To maximize the margin while penalizing points that lie on the wrong side of the margin boundary
SVM-Overlapping Class Distributions
Identical to separable case Again represents a quadratic programming problem
Support Vector Machines
Relation to logistic regression Hinge loss used in SVM and the error function of logistic regression approximate the ideal misclassification error(MCE) Black : MCE, Blue: Hinge Loss, Red: Logistic Regression, Green: Squared Error
Similar presentations
© 2025 Inc.
All rights reserved.