Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford 1, Christophe Diot 2 1 Princeton University, 2 Thomson Research
2 Outline Background and motivation Traffic anomaly detection PCA and subspace approach Problems with methodology Conclusion & future directions
3 A network in the Internet
4 Network anomalies We want to be able to detect these anomalies!
5 Network anomaly detectors Monitor health of network Real-time reporting of anomalies
6 Principal Components Analysis (PCA) Benefits Finds correlations across multiple links Network-wide analysis [Lakhina SIGCOMM’04] Demonstrated ability to detect wide variety of anomalies [Lakhina IMC’04] Subspace methodology We use same software
7 Principal Components Analysis (PCA) PCA transforms data into new coordinate system Principal components (new bases) ordered by captured variance The first k tend to capture periodic trends normal subspace vs. anomalous subspace
8 Pictorial overview of subspace methodology 1. Training: separate normal & anomalous traffic patterns 2. Detection: find spikes 3. Identification: find original spatial location that caused spike (e.g. router, flow)
9 Pictorial overview of problems with subspace methodology Defining normalcy can be challenging Tunable knobs Contamination PCA’s coordinate remapping makes it difficult to identify the original location of an anomaly
10 Data used Géant and Abilene networks IP flow traces 21/11 through 28/ Anomalies were manually verified
11 Outline Background and motivation Problems with approach Sensitivity to its parameters Contamination of normalcy Identifying the location of detected anomalies Conclusion & future directions
12 Sensitivity to top k PCA separates normal from anomalous traffic patterns Works because top PCs tend to capture periodic trends And large fraction of variance
13 Sensitivity to top k Where is the line drawn between normal and anomalous? What is too anomalous?
14 Sensitivity to top k Very sensitive to number of principal components included!
15 Sensitivity to top k Sensitivity wouldn’t be an issue if we could tune top k parameter We’ve tried many different methods 3σ deviation heuristic Cattell’s Scree Test Humphrey-Ilgen Kaiser’s Criterion None are reliable
16 Contamination of normalcy What happens to large anomalies? They capture a large fraction of variance Therefore they are included among top PCs Invalidates assumption that top PCs need to be periodic Pollutes definition of normal In our study, the outage to the left affected 75/77 links Only detected on a handful!
17 Identifying anomaly locations Spikes when state vector projected on anomaly subspace But network operators don’t care about this They want to know where it happened! How do we find the original location of the anomaly?
18 Identifying anomaly locations Previous work used a simple heuristic Associate detected spike with k flows with the largest contribution to the state vector v No clear a priori reason for this association
19 Outline Background and motivation Problems with approach Conclusion & future directions Defining normalcy Identifying the location of an anomaly
20 Defining normalcy Large anomalies can cause a spike in first few PCs Diminishes effectiveness But we can presumably smooth these out (WMA) But first PCs aren’t always periodic which k instead of top k ? Initial results suggest this might be challenging also
21 Fundamental disconnect between objective functions PCA is optimal at finding orthogonal vectors ordered by captured variance But variance need not correspond to normalcy (i.e. periodicity) When do they coincide?
22 Identifying anomaly locations PCA is very effective at finding correlations But is accomplished by remapping all data to new coordinate system Strength in detection becomes weakness in identification Inherent limitation
23 Conclusion PCA is sensitive to its parameters More robust methodology required Training: defining normalcy (top k, which k ) Detection: tuning threshold Identification: better heuristic Disconnect between objective functions PCA finds variance We seek periodicity PCA’s strengths can be weaknesses Transformation good at detecting correlations Causes difficulty in identifying anomaly location
Thanks! Questions? Haakon Ringberg Princeton University Computer Science
25 Outline Background and motivation Problems with approach Future directions Conclusion Addressable problems, versus Fundamental problems
26 Conclusion: addressable PCA is sensitive to its parameters More robust methodology required Training: defining normalcy (top k, which k ) Detection: tuning threshold Identification: better heuristic Previous work used same data and optimized parameter settings as Lakhina et al. But these concerns might be addressable
27 Conclusion: fundamental We don’t know what “normal” is Disconnect between objective functions PCA finds variance We seek periodicity PCA’s strengths can be weaknesses Transformation good at detecting correlations Causes difficulty in identifying anomaly location Are other methods are more appropriate? We require a standardized evaluation framework