Presentation is loading. Please wait.

Presentation is loading. Please wait.

D. M. J. Tax and R. P. W. Duin. Presented by Mihajlo Grbovic Support Vector Data Description.

Similar presentations


Presentation on theme: "D. M. J. Tax and R. P. W. Duin. Presented by Mihajlo Grbovic Support Vector Data Description."— Presentation transcript:

1 D. M. J. Tax and R. P. W. Duin. Presented by Mihajlo Grbovic Support Vector Data Description.

2 INTRODUCTION Problem of data description or One-Class Classification – make a description of a training set of objects and detect which (new) objects resemble this training set Data description can be used for: 1.Outlier detection – to detect uncharacteristic objects from a data set Outliers in the data often show an exceptionally large or small feature value in comparison with other training objects Many machine learning techniques it would be useful to do the outlier detection first in order to detect and reject them to avoid unfounded confident classifications

3 INTRODUCTION 2. Classification problem where one of the classes is undersampled Measurement of the normal working conditions is very cheap and easy to obtain Measurement of the outliers (when there are problems with the machine) would require the destruction of the machine in all possible ways – not cheap

4 INTRODUCTION 3. Comparison of two data sets We train a classifier on some data after long optimization… When we need to solve a similar problem with some new data. That data can be compared with the old training set.. But if the data is not comparable we will need to train a new classifier. FEMALE MALE ? 0 1 CLASSIFIERCLASSIFIER

5 SOLUTIONS FOR SOLVING DATA DESCRIPTION Most often the solution focus on outlier detection: Simplest solution: - Generate outlier data around the target set. - Ordinary classifier is than trained to distinguish between the target data and outliers. - This method requires near-target objects belonging to the outlier class. (if not already there – have to be created) - Scales poorly in high dimensional problems. Bayesian approach can also be used for detecting outliers: - Instead of using the most probable weight configuration of a classifier to compute the output, the output is weighted by the probability that the weight configuration is correct given the data. - This method can then provide an estimate of the probability for a certain object given the model family. Low probabilities will then indicate a possible outlier. - The method is computationally expensive.

6 SOLUTIONS FOR SOLVING DATA DESCRIPTION Our solution: One-class classifiers - One class is the target class, and all other data is outlier data. - Create a spherically shaped boundary around the complete target set. - To minimize the chance of accepting outliers, the volume of this description is minimized. - Outlier sensitivity can be controlled by changing the ball-shaped boundary into a more flexible boundary. - Example outliers can be included into the training procedure to find a more efficient description.

7 METHOD We assume vectors x are column vectors. We have a training set {xi }, i = 1,..., N for which we want to obtain a description. We further assume that the data shows variances in all feature directions. NORMAL DATA DESCRIPTION - The sphere is characterized by center a and radius R > 0. - We minimize the volume of the sphere by minimizing R², and demand that the sphere contains all training objects xi. - To allow the possibility of outliers in the training set, the distance from xi to the center a should not be strictly smaller than R², but larger distances should be penalized. - Minimization problem: F(R, a) = R² + Ci∑ξi with constraints ||xi − a||² ≤ R² + ξi, ξi ≥ 0

8 METHOD NORMAL DATA DESCRIPTION L(R, a, αi, γi, ξi ) = R² + C∑ξi − ∑αi {R² + ξi − (||xi||² − 2a · xi + ||a||²)} − ∑γi ξi L should be minimized with respect to R, a, ξi and maximized with respect to αi and γi: } With subject to: 0 ≤ αi ≤ C } Support vectors

9 METHOD SVDD with negative examples -When negative examples (objects which should be rejected) are available, they can be incorporated in the training to improve the description. -In contrast with the training (target) examples which should be within the sphere, the negative examples should be outside it. -Minimization problem: With constraints: }

10 METHOD on the right the same data set with one outlier object. A new description has to be computed to reject this outlier. With a minimal adjustment to the old description, the outlier is placed on the boundary of the description. It becomes a support vector for the outlier class. on the left we have Normal DD with no outliers, circles are support vectors. 3 object required to describe the data set. Although the description is adjusted to reject the outlier object, it does not fit tightly around the rest of the target set. A more flexible description is required.

11 METHOD Introducing kernel functions - Replacing the new inner product by a kernel function K(xi, xj ) = ((xi )·(xj )) - Mapping of the data into another (possibly high dimensional) feature space is defined. - An ideal kernel function would map the target data onto a bounded, spherically shaped area in the feature space and outlier objects outside this area. 1. The polynomial kernel:, d is the dimension For degree d = 6 the description is a sixth order polynomial. Here the training objects most remote from the origin become support objects. Problem - Large regions in the input space without target objects will be accepted by the description.

12 METHOD 2. The Gaussian kernel: - For small values of s all objects become support vectors. Test object is selected when: - For very large s the solution approximates the original spherically shaped solution. - Decreasing the parameter C constraints the values for αi more, and more objects become support vectors. - Also with decreasing C the error on the target class increases, but the covered volume of the data description decreases.

13 METHOD - The performance of the one-class classifiers which only use information from the target set, perform worse, but in some cases still comparable to the classifiers which use information of both the target and outlier data. - In most cases the data descriptions with the polynomial kernel perform worse than with the Gaussian kernel, except for a few cases. SVDD characteristics - The minimum number of support vectors is an indication of the target error which can minimally be achieved (we introduce essential support vectors). - Leave-one-out error estimate on the target set: - Increasing dimensionality the volume of the outlier block tends to grow faster than the volume of the target class. Overlap between the target and outlier data decreases and the classification problem becomes easier but we need more data for estimation of the boundary. - Parameter s can be set to give the desired number of support vectors.

14 EXPERIMENTS - How SVDD works in a real one-class classification problems compared to other methods: Gaussian, Parzen density, Mixture of Gaussians and kNN - Machine diagnostics problem: the characterization of a submersible water pump - target objects (measurements on a normal operating pump) - outlier objects, negative examples (measurements on a damaged pump)

15 EXPERIMENTS - A good discrimination between target and outlier objects means both a small fraction of outlier accepted and a large fraction of target objects accepted. Data set 1 - whole working area 64 D 15 D – by PCA

16 EXPERIMENTS Data sets 2 to 5 – approximations of the first one In almost all cases SVDD obtains better performance. Other methods improve by reducing the dimensionality – not useful in practice

17 CONCLUSION AND COMMENTS CONCLUSION It is possible to solve multidimensional outlier detection problem by obtaining a boundary around the data. Inspired by the Support Vector Machines this boundary can be described by a few support vectors. STRONG POINTS Shows comparable or better results for sparse and complex data sets. Needs less training data to work, compared to other methods. WEEK POINTS Sets of outliers can be “good” and “poor” in which case manual optimization of C1 an C2 is required. Selection of s and C can sometimes be hard. For high sample sizes, density estimation methods are preferred.


Download ppt "D. M. J. Tax and R. P. W. Duin. Presented by Mihajlo Grbovic Support Vector Data Description."

Similar presentations


Ads by Google