Linear Models for Classification

Name: Linear Models for Classification
Uploaded: 2017-08-19T12:28:38+00:00
Duration: PTM6S24
Channel: Julia Flowers
Description: Linear Models for Classification

Linear Models for Classification
Berkay Topçu

Linear Models for Classification
Goal: Take an input vector and assign it to one of K classes (Ck where k=1,...,K) Linear separation of classes

Generalized Linear Models
We wish to predict discrete class labels, or more generally class posterior probabilities that lies in range (0,1). Classification model as a linear function of the parameters, Classification directly in the original input space , or a fixed nonlinear transformation of the input variables using a vector of basis functions

Discriminant Functions
Linear discriminants If , assign to class C1 and to class C2 otherwise Decision boundary is given by determines the orientation of the decision surface and determines the location Compact notation:

Multiple Classes K-class discriminant by combining number of two-class discriminant functions (K>2) One-versus-the-rest: seperating points in one particular class Ck from points not in that class One-versus-one: K(K-1)/2 binary discriminant functions

Multiple Classes A single K-class discriminant comprising K linear functions Assign to class Ck if for all How to learn the parameters of linear discriminant functions?

Least Squares for Classification
Each class Ck is described by its own linear model Training data set for n =1,...,N where Matrix whose nth row is the vector and whose nth row is

Least Squares for Classification
Minimizing the sum-of-squares error function Solution : Discriminant function :

Fisher’s Linear Discriminant
Dimensionality reduction: take the D-dimensional input vector and project to one dimension using Projection that maximizes class seperation Two-class problem: N1 points of C1 and N2 points of C2 Fisher’s idea: large separation between the projected class means small variance within each class, minimizing class overlap

The Fisher criterion:

For the two-class problem, Fisher criterion is a special case of least squares (reference : Penalized Discriminant Analysis – Hastie, Buja and Tibshirani) For multiple classes: The weights values are determined by the eigenvectors that corresponds to K highest eigenvalues of

The Perceptron Algorithm
Input vector is transformed using a nonlinear transformation Perceptron criterion: For all training samples We need to minimize

The Perceptron Algorithm – Stocastic Gradient Descent
Cycle through the training patterns in turn If the pattern is correctly classified weight vectors remains unchanged, else:

Probabilistic Generative Models
Depend on simple assumptions about the distribution of the data Logistic sigmoid function Maps the whole real axis to a finite interval

Continuous Inputs - Gaussian
Assuming the class-conditional densities are Gaussian Case of two classes

Maximum Likelihood Solution
Likelihood function: Maximizing log-likelihood

Probabilistic Discriminative Models
Probabilistic generative model Number of parameters grows quadratically with M (# dim.) However has M adjustable parameters Maximum likelihood solution for Logistic Regression Energy function: negative log likelihood

Iterative Reweighted Least Squares
Newton-Raphson iterative optimization on linear regression Same as the standard least-squares solution

Iterative Reweighted Least Squares
Newton-Raphson update for negative log likelihood Weighted least-squares problem

Maximum Margin Classifiers
Support Vector Machines for two-class problem Assuming linearly seperable data set There exists at least one set of variables satisfies That give the smallest generalization error Margin: the smallest distance between decision boundary and any of the samples

Support Vector Machines
Optimization of parameters, maximizing the margin Maximizing the margin minimizing : subject to the constraint: Introduction of Lagrange multipliers

Support Vector Machines - Lagrange Multipliers
Minimizing with respect to w and b and maximizing with respect to a. The dual form: Quadratic programming problem:

Overlapping class distributions (linearly unseparable data) Slack variable: distance from the boundary To maximize the margin while penalizing points that lie on the wrong side of the margin boundary

SVM-Overlapping Class Distributions
Identical to separable case Again represents a quadratic programming problem

Relation to logistic regression Hinge loss used in SVM and the error function of logistic regression approximate the ideal misclassification error(MCE) Black : MCE, Blue: Hinge Loss, Red: Logistic Regression, Green: Squared Error

Linear Models for Classification

Similar presentations

Presentation on theme: "Linear Models for Classification"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Linear Models for Classification

Similar presentations

Presentation on theme: "Linear Models for Classification"— Presentation transcript:

Similar presentations

About project

Feedback