Presentation is loading. Please wait.

Presentation is loading. Please wait.

LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS

Similar presentations


Presentation on theme: "LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS"— Presentation transcript:

1 LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Course - CS 698 | Current Topics in Data Science Dr. Usman Roshan LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS By: Nehal Navande Sneha Awasthi

2 LINEAR CLASSIFICATION USING SVM

3 SVM OVERVIEW SVM Definition Linear Classification
Optimally separating the hyperplane Maximum Margin Gradient Descent for Optimization Learning factor η

4 SUPPORT VECTOR MACHINES
SVM In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. MACHINE LEARNING SUPPORT VECTOR MACHINES SUPERVISED LEARNING ALGORITHMS

5 SVM MACHINE LEARNING – It is a programming computers to optimize a performance criterion using example data or past experience. SUPERVISED LEARNING ALGORITHMS – Supervised learning problems are defined as an input X, an output Y, and the task is to learn to mapping from the input to the output y = g (x|ø) Regression and classification are supervised learning problems

6 LINEAR CLASSIFICATION
Instances of a class are linearly separable Discriminant based approach Uses the labeled samples Both space and time complexities are O(d)

7 OPTIMALLY SEPARATING HYPERPLANES
Training sample linearly separated by a line as negatively and positively labelled points Objective – To find a line that separates the training samples with maximum margin Margin – Minimum distance from the nearest training data points Support Vectors – A support vector is a vector of a data point (xi) that lies on the hyperplane

8 MAXIMUM MARGIN Distance between any point x0 is given by |w*x + b|
where w ∈ RN , b ∈ R |w*x +b| = 1 Therefore, margin = w*x+b= 0 margin w*x+b=-1 w*x+b=+1

9 GRADIENT DESCENT Gradient descent method is a way to find a local minimum of a function Objective : Parameters are optimized to minimize the classification error on the training dataset. Let w = set of parameters E(w|X) is the error with parameter w for a given training set X w* = arg minw E(w|X) Gradient vector is a partial derivative of error E(w) given by

10 GRADIENT DESCENT contd..
To achieve this: w is selected at random At each step, w is updated wi = wi + ∆wi value of w changes in opposite direction to that of gradient descent where η – stepsize or learning factor At global minimum the procedure terminates

11 Deciding on the learning factor is critical
step-size η is too small, function converges Long time to converge step-size η is too large, function converges initially and then diverges

12 NON LINEAR CLASSIFICATION USING KERNEL METHODS

13 OUTLINE Non linear Classification Kernel Functions Kernel Trick

14 NON LINEAR CLASSIFICATION
Data sets that are linearly separable can be dealt with very conveniently. But what do we do if the data set just doesn’t allow classification by linear classifier. For e.g. The image below describes how convenient it is to separate a linear data in comparison to the non linear data. Non linear classification is done where classes are not linearly separable by a boundary.

15 KERNEL FUNCTIONS For e.g. In the image below, the data points belong to two main classes an inner ring and an outer. Intuitively , just by looking at the image one can conclude that these two classes are not linearly separable.

16 KERNEL FUNCTIONS (contd..)
However it is intuitively clear that an elliptical or circular hyperplane can easily separate the two classes. In order to classify a more complex feature space simple trick would be to transform the two variables x and y into a new feature space involving x (or y) and a new variable z defined as z = sqrt(x^2+y^2). The representation of z is nothing more than the equation for a circle. When the data is transformed in this way, the resulting feature space involving x and z will appear as shown below. Clearly, this new problem in x and z dimensions is now linearly separable and we can apply a standard SVM.

17 KERNEL FUNCTIONS (contd..)
Maps data into a new space, then take the inner product of the new vectors. Kernel functions transform nonlinear spaces into linear ones. Kernel functions can be viewed as a similarity measure- the more similar the points x and y are the larger the value of K(x,y) should be. When we run a linear SVM on such transformed data, the probability of getting an accuracy of classification is nearly 100%.

18 POPULAR KERNELS Some popular kernels are:
Linear Kernel: It is one of the simplest kernel and is just the inner product of <x,y> Polynomial Kernel: A polynomial kernel for degree d can be given by this eqn. RBF Kernel: (Radial Basis Functions) This kernel is given by eqn. below: 𝑘 𝑥,𝑦 =exp⁡( −ϒ | 𝑥 𝑖 − 𝑥 𝑗 | 2 ) 𝑘(𝑥,𝑦)= ( 𝑥 𝑡 y + c) 𝑑 where x and y are vectors in the input space.

19 SVM WITH POLYNOMIAL KERNEL VISUALIZATION

20 KERNEL TRICK A non linear classification problem can be converted to a linear classification problem by mapping the input vectors from the input space to a higher dimensional input space. The kernel trick is a mathematical tool which can be applied to any algorithm which solely depends on the dot product between two vectors. For the transformation of data into higher dimensions, we cannot just randomly project it. We don’t want to compute the mapping explicitly either. Hence we use only those algorithms which use dot products of vectors in the higher dimension to come up with the boundary. The kernel trick consists in replacing dot products with an equivalent kernel function: k(x, 𝑥 ′ ) = Φ (𝑥) 𝑡 Φ (𝑥 ′ )

21 REFERENCES Alpaydin, E. (2010). Introduction to machine learning. Cambridge, Mass., Estados Unidos: MIT Press. Mohri, M., Talwalkar, A. and Rostamizadeh, A. (2012). Foundations of Machine Learning (Adaptive Computation and Machine Learning Series). MIT Press. En.wikipedia.org. (2018). Support vector machine. [online] Available at: CorinnaCortes-1 [Accessed 28 Mar. 2018]. ann.pdf kernel-functions-to-classify-data

22 QUESTIONS


Download ppt "LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS"

Similar presentations


Ads by Google