Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Slides:

Advertisements

Similar presentations

Introduction to Support Vector Machines (SVM)

Advertisements

ECG Signal processing (2)

Perceptron Learning Rule

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

An Introduction of Support Vector Machine

Data Mining Classification: Alternative Techniques

Support Vector Machines

SVM—Support Vector Machines

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Separating Hyperplanes

Linear Discriminant Functions Wen-Hung Liao, 11/25/2008.

Linear Discriminant Functions

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Support Vector Machines (and Kernel Methods in general)

Simple Neural Nets For Pattern Classification

Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.

CES 514 – Data Mining Lecture 8 classification (contd…)

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.

Prénom Nom Document Analysis: Non Parametric Methods for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Discriminant Functions Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

Chapter 6: Multilayer Neural Networks

Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.

Lecture 10: Support Vector Machines

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Linear Discriminant Functions Chapter 5 (Duda et al.)

Linear Discriminators Chapter 20 From Data to Knowledge.

CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.

Neural Networks Lecture 8: Two simple learning algorithms

Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

An Introduction to Support Vector Machines (M. Law)

Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:

Non-Bayes classifiers. Linear discriminants, neural networks.

CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.

Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.

Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.

Lecture 4 Linear machine

An Introduction to Support Vector Machine (SVM)

CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Chapter 2 Single Layer Feedforward Networks

1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 7: Linear and Generalized Discriminant Functions.

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.

Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.

Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.

Bab 5 Classification: Alternative Techniques Part 4 Artificial Neural Networks Based Classifer.

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.

Linear machines márc Decison surfaces We focus now on the decision surfaces Linear machines = linear decision surface Non-optimal solution but.

LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.

Large Margin classifiers

Chapter 2 Single Layer Feedforward Networks

An Introduction to Support Vector Machines

Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis

Linear machines 28/02/2017.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Support Vector Machines

Presentation transcript:

Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008

© Prof. Rolf Ingold 2 Outline  Introduction to linear discrimination  Linear machines  Generalized discriminant functions  Augmented vectors and linear separability  Objective functions and gradient descent procedure  Perceptron (principle and algorithms)  Perceptron with margins  Relaxation with margins  Principles of support vector machines

© Prof. Rolf Ingold 3 Principle of linear discrimination  The principle consist in determining region boundaries (or equivalently discriminant functions) directly from training samples  Additionally it is assumed that these functions are linear  fast to compute  well known properties  no loss of generality, if combined with arbitrary feature transformations  The problem of finding discriminant functions is stated as an optimization problem  minimizing an error cost on training samples

© Prof. Rolf Ingold 4 Linear discriminant functions for two classes  A linear discriminant function is written where  x represents the feature vector of the sample to be classified  w is a weight vector and w 0 is threshold weight (or bias), which have to be determined  The equation g( x ) = 0 defines a decision boundary between two classes

© Prof. Rolf Ingold 5 Geometrical interpretation  The decision boundary is a hyperplane dividing the feature space into two half-spaces  w represents a normal vector, since for x 1 et x 2 belonging to the hyper-plane  the distance of x to the hyperplane is since

© Prof. Rolf Ingold 6 Discrimination of multiple classes  To discriminate c classes pairwise, we must use c(c-1)/2 discriminant functions  decision regions do not produce a partition of the feature space  ambiguous regions appear !

© Prof. Rolf Ingold 7 Linear machines  Multiple class discrimination can be performed with exactly one function g i (x) per class  the decision consists in choosing  i that maximizes g i (x)  decision boundary between  i et  j is given by the hyperplane  ij defined by the equation g i (x) - g j (x) = 0  decision regions produce a partition of the feature space

© Prof. Rolf Ingold 8 Quadratic discriminant functions  Discriminant functions can be generalized with quadratic terms on x  decision boundaries become non linear  By extending the feature space with quadratic forms, decision boundaries become again linear

© Prof. Rolf Ingold 9 Generalized discriminant functions  A more general approach consists of using generalized discriminant functions of the form where y k (x) are arbitrary functions of x in some other dimension  Decision boundaries will linear in the space of y but not in the space containing x

© Prof. Rolf Ingold 10 Augmented vectors  The principle of generalized discriminant functions can be applied to define augmented vectors as follows where w 0 is added as a vector component  The problem is formulated in a new space with a dimension augmented by 1 where  the hyperplane a t y = 0 passes through the origin  the distance from y to the hyperplane is equal to |a t y|/||a||

© Prof. Rolf Ingold 11 Linear separability  Let us consider n samples {y 1,... y n } labeled either  1 or  2  We are looking for a separating vector a such as  Each training sample is putting a constraint on the solution region  If a solution exists, the two classes are said to be linearly separable  By replacing all y i labeled  2 by -y i we obtain the new condition which allows to ignore the class labels

© Prof. Rolf Ingold 12 Gradient descent procedure  To find a solution for a satisfying for a set of unequalities a t y i >0 we can minimize an objective function J(a) and apply a gradient descent procedure :  chose a[0]  compute a[k] iteratively using where the learning rate  (k) > 0 controls the convergence  if  (k) is small, the convergence is slow  if  (k) is too large, the iteration may not converge  stop when a the convergence criteria is reached  The approach can be refined by a second order method using the Hessian matrix

© Prof. Rolf Ingold 13 Objective functions  Considering the set of misclassified samples Y ={y i | a t y i  0}, the following objective functions can be considered  le number of misclassified samples  la perceptron rule minimizing the sum of distances from misclassified samples to the decision boundary  the sum of square distances of misclassified samples  a criteria using margins

© Prof. Rolf Ingold 14 Illustrations of objective functions

© Prof. Rolf Ingold 15 Perceptron principle  The objective function to be minimized is and its gradient is  Thus, the update rules becomes  at each step the distance from y to the boundary is reduced  If there exist a solution, the perceptron always finds one

© Prof. Rolf Ingold 16 Perceptron algorithms  The perceptron rule can be implemented in two ways  Batch perceptron algorithm : at each step, (a multiple) of the sum of all misclassified samples is added to the weight vector  Iterative single-sample perceptron algorithm : at each step, a selected misclassified sample is added to the weight vector chose a; k = 0; repeat k = k+1 mod n; if a.y[k] < 0 then a = a + y[k]; until a.y > 0 for all y

© Prof. Rolf Ingold 17 Final remarks on the perceptron  The iterative single-sample perceptron algorithm ends with a solution if and only if the classes are linearly separable  The found solution is often not optimal regarding generalization  the solution vector is often at the border of the solution region  There exist variants, which improve this behavior  The perceptron rule is at the origin of a family of artificial neural networks called multi-layer perceptron (MLP), which are interesting for pattern recognition

© Prof. Rolf Ingold 18 Discrimination with margin  To improve generalization behavior, the constranit a t y > 0 can be replaced by a t y > b where b > 0 is called margin  the solution region is reduced by bands with a width of b / ||y i ||

© Prof. Rolf Ingold 19 Perceptron with margin  The perceptron algorithm can be generalized by using margins  the update rule becomes  it can be shown that, If the classes are linearly separable, the algorithm finds always a solution under the conditions  this is the case for  (k) = 1 and  (k) = 1/k

© Prof. Rolf Ingold 20 Relaxation procedure with margin  The objective function of the perceptron is piecewise linear and its gradient is not continuous  The relaxation procedure generalize the approach by considering where Y contains all samples y for which a t y ≤ b  The gradient of J r being the update rule becomes

© Prof. Rolf Ingold 21 Relaxation algorithm  The relaxation algorithm in batch mode is as follows define b, nano[k]; chose a; k = 0; repaet k = k+1; sum = {0,…,0}; for each y do if a.y ≤ b then sum = sum + (b-a.y)/(y.y)*y; a = a + nano[k]*sum; until a.y > b for all y  There exists also a single sample iterative version

© Prof. Rolf Ingold 22 Support Vector Machines  Support vector machines (SVM) are based on similar considerations  the feature space is mapped on a much higher dimensions using a non linear mapping including for each a component y k,0 =1  for each pattern let z k = ±1 according to the class  1 or  2 the patter y k belongs to  let g(y) = a t y be a linear discriminant; then a separating hyperplane ensures

© Prof. Rolf Ingold 23 SVM optimization criteria  The goal of a support vector machine is to find the separating hyperplane with the largest margin  supposing a margin b > 0 exists, the goal is to find the vector a that maximizes b in  points verifying are called support vectors

© Prof. Rolf Ingold 24 Conclusion on SVM  SVMs are still subject to numerous researches issues  choice of basic functions  optimized training strategies  SVMs are reputed to avoid overfitting and therefore to have good generalization properties