Download presentation
Presentation is loading. Please wait.
2
Feature Extraction for Outlier Detection in High- Dimensional Spaces Hoang Vu Nguyen Vivekanand Gopalkrishnan
3
Motivation Feature Extraction for Outlier Detection 2 Outlier detection techniques Compute distances between points in full feature space Curse of dimensionality Solution: feature extraction Feature extraction techniques Do not consider class imbalance Not suitable for asymmetric classification (and outlier detection!)
4
Overview Feature Extraction for Outlier Detection 3 DROUT Dimensionality Reduction/Feature Extraction for OUTlier Detection Extract features for the detection process To be integrated with outlier detectors Training set DROUT Features Testing set Detector Outliers
5
Background Feature Extraction for Outlier Detection 4 Training set: Normal class ω m : cardinality N m, mean vector μ m, covariance matrix ∑ m Anomaly class ω a : cardinality N m, mean vector μ a, covariance matrix ∑ a N m >> N a Total number of points: N t = N m + N a ∑ w = (N m /N t ). ∑ m + (N a /N t ). ∑ a ∑ b = (N m /N t ). (μ m – μ t ) (μ m – μ t ) T + (N a /N t ). (μ a – μ t )(μ a – μ t ) T ∑ t = ∑ w + ∑ b
6
Background (cont.) Feature Extraction for Outlier Detection 5 Eigenspace of scatter matrix ∑ : (spanned by eigenvectors) Consists of 3 subspaces: principal, noise, and null space Solving eigenvalue problem and obtain d eigenvalues v 1 ≥ v 2 ≥ … ≥ v d Noise and null subspaces are caused by noise and mainly by the insufficient training data Existing methods: discard the noise and null subspaces loss of information Jiang et al. 2008: regularize all 3 subspaces before performing feature extraction 1mrd PN Ø 0 Plot of eigenvalues
7
DROUT Approach Feature Extraction for Outlier Detection 6 Weight-adjusted Within-Class Scatter Matrix ∑ w = (N m /N t ). ∑ m + (N a /N t ). ∑ a N m >> N a ∑ a is far less reliable than ∑ m Weighing ∑ m and ∑ a according to (N m /N t ) and (N a /N t ) when doing feature extraction on ∑ w (using PCA etc.), dimensions (eigenvectors) specified mainly by small eigenvalues of ∑ m unexpectedly removed dimensions extracted are not really relevant for the asymmetric classification task Xudong Jiang: Asymmetric principal component and discriminant analyses for pattern classification. IEEE Trans. Pattern Anal. Mach. Intell., 31(5), 2009 Solutions ∑ w = w m. ∑ m + w a. ∑ a w m < w a and w m + w a = 1 more suitable for asymmetric classification
8
DROUT Approach (cont.) Feature Extraction for Outlier Detection 7 Which matrix to regularize first? Goal: extract features that minimize the within-class and maximize the between-class variances Within-class variances are estimated from limited training data small variances estimated tend to be unstable and cause overfitting proceed with regularizing 3 subspaces of the adjusted within-class scatter matrix
9
DROUT Approach (cont.) Feature Extraction for Outlier Detection 8 Subspace decomposition Solving eigenvalue problem on (weight-adjusted) ∑ w and obtain eigenvectors {e 1, e 2, …, e d } with corresponding eigenvalues v 1 ≥ v 2 ≥ … ≥ v d Identify m: v med = median i ≤ r {v i } v m+1 = max i ≤ r {v i | v i < 2v med – v r } 1mrd PN Ø 0 Plot of eigenvalues
10
DROUT Approach (cont.) Feature Extraction for Outlier Detection 9 Subspace regularization a = v 1. v m. (m – 1)/(v 1 – v m ) b = (mv m – v 1 )/(v 1 – v m ) Regularize: i ≤ m: x i = v i m < i ≤ r: x i = a/(i + b) r < i ≤ d: x i = a/(r + 1 + b) A = [e i. w i ] 1 ≤ i ≤ d where w i = 1/sqrt(x i ) 1mrd PN Ø 0
11
DROUT Approach (cont.) Feature Extraction for Outlier Detection 10 Subspace regularization p T = A T. p with p being a data point Form new (weight-adjusted) total scatter matrix (slide 4) and solve the eigenvalue problem using it B = matrix of c resulting eigenvectors with largest eigenvalues feature extraction done only after regularization limit loss of information Xudong Jiang, Bappaditya Mandal, and Alex ChiChung Kot: Eigenfeature regularization and extraction in face recognition. IEEE Trans. Pattern Anal. Mach. Intell., 30(3):383–394, 2008 Transform matrix: M = A. B
12
DROUT Approach (cont.) Feature Extraction for Outlier Detection 11 Summary: Let ∑ w = w m. ∑ m + w a. ∑ a Compute A from ∑ w Transform the training set using A Compute the new total scatter matrix ∑ t Compute B by solving the eigenvalue problem on ∑ t M = A. B Use M to transform the testing set
13
Related Work Feature Extraction for Outlier Detection 12 APCDA Xudong Jiang: Asymmetric principal component and discriminant analyses for pattern classification. IEEE Trans. Pattern Anal. Mach. Intell., 31(5), 2009 Uses weight-adjusted scatter matrices for feature extraction Discards noise and null subspaces loss of information ERE Xudong Jiang, Bappaditya Mandal, and Alex ChiChung Kot: Eigenfeature regularization and extraction in face recognition. IEEE Trans. Pattern Anal. Mach. Intell., 30(3):383–394, 2008 Performs regularization before feature extraction Ignores class imbalance not suitable for outlier detection ACP David Lindgren and Per Spangeus: A novel feature extraction algorithm for asymmetric classification. IEEE Sensors Journal, 4(5):643–650, 2004 Consider neither noise-null subspaces nor class imbalance
14
Outlier Detection with DROUT Feature Extraction for Outlier Detection 13 Detectors: ORCA Stephen D. Bay and Mark Schwabacher: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In KDD, pages 29–38, 2003 BSOUT George Kollios, Dimitrios Gunopulos, Nick Koudas, and Stefan Berchtold: Efficient biased sampling for approximate clustering and outlier detection in large data sets. IEEE Trans. Knowl. Data Eng., 15(5):1170–1187, 2003
15
Outlier Detection with DROUT (cont.) Feature Extraction for Outlier Detection 14 Datasets: KDD Cup 1999 Normal class (60593 records) vs. U2R class (246) d = 34 (7 categorical attributes are excluded) Training set: 1000 normal recs. vs. 50 anomalous recs. Ann-thyroid 1 Class 3 vs. class 1 d = 21 Training set: 450 normal recs. vs. 50 anomalous recs. Ann-thyroid 2 Class 3 vs. class 2 d = 21 Training set: 450 normal recs. vs. 50 anomalous recs. Parameter settings: w m = 0.1 and w a = 0.9 Number of extracted features b ≤ d/2
16
Results Feature Extraction for Outlier Detection 15
17
Results (cont.) Feature Extraction for Outlier Detection 16
18
Conclusion Feature Extraction for Outlier Detection 17 Summary of contributions Explore the effect of feature extraction on outlier detection Results on real datasets and two detection methods are promising A novel framework for ensemble outlier detection. Experiments on real data sets seem to be promising Future work More experiments on larger datasets Examine other possibilities of dimensionality reduction
19
Last words…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.