Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.

Slides:



Advertisements
Similar presentations
URCA: Pulling out Anomalies by their Root Causes Fernando Silveira and Christophe Diot.
Advertisements

Compensation for Measurement Errors Due to Mechanical Misalignments in PCB Testing Anura P. Jayasumana, Yashwant K. Malaiya, Xin He, Colorado State University.
Detectability of Traffic Anomalies in Two Adjacent Networks Augustin Soule, Haakon Ringberg, Fernando Silveira, Jennifer Rexford, Christophe Diot.
G. Alonso, D. Kossmann Systems Group
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
FLAME: A Flow-level Anomaly Modeling Engine
Forward-Backward Correlation for Template-Based Tracking Xiao Wang ECE Dept. Clemson University.
Nick Duffield, Patrick Haffner, Balachander Krishnamurthy, Haakon Ringberg Rule-Based Anomaly Detection on IP Flows.
Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.
1 Communication-Efficient Online Detection of Network-Wide Anomalies Ling Huang* XuanLong Nguyen* Minos Garofalakis § Joe Hellerstein* Michael Jordan*
Robust Network Compressive Sensing Lili Qiu UT Austin NSF Workshop Nov. 12, 2014.
DECO3008 Design Computing Preparatory Honours Research KCDCC Mike Rosenman Rm 279
1 In-Network PCA and Anomaly Detection Ling Huang* XuanLong Nguyen* Minos Garofalakis § Michael Jordan* Anthony Joseph* Nina Taft § *UC Berkeley § Intel.
Statistics for the Social Sciences
SAC’06 April 23-27, 2006, Dijon, France On the Use of Spectral Filtering for Privacy Preserving Data Mining Songtao Guo UNC Charlotte Xintao Wu UNC Charlotte.
Detecting Image Region Duplication Using SIFT Features March 16, ICASSP 2010 Dallas, TX Xunyu Pan and Siwei Lyu Computer Science Department University.
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
1 Toward Sophisticated Detection With Distributed Triggers Ling Huang* Minos Garofalakis § Joe Hellerstein* Anthony Joseph* Nina Taft § *UC Berkeley §
Learning-Based Anomaly Detection in BGP Updates Jian Zhang Jennifer Rexford Joan Feigenbaum.
Statistics for the Social Sciences Psychology 340 Fall 2006 Review For Exam 1.
Feature Extraction for Outlier Detection in High- Dimensional Spaces Hoang Vu Nguyen Vivekanand Gopalkrishnan.
School of Computer Science and Information Systems
EL 933 Final Project Presentation Combining Filtering and Statistical Methods for Anomaly Detection Augustin Soule Kav´e SalamatianNina Taft.
Network Anomography Yin Zhang, Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement Conference 2005 Berkeley, CA, USA Presented by Huizhong.
A Signal Analysis of Network Traffic Anomalies Paul Barford, Jeffrey Kline, David Plonka, and Amos Ron.
Principal Component Analysis. Philosophy of PCA Introduced by Pearson (1901) and Hotelling (1933) to describe the variation in a set of multivariate data.
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
A Signal Analysis of Network Traffic Anomalies Paul Barford with Jeffery Kline, David Plonka, Amos Ron University of Wisconsin – Madison Summer, 2002.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Robust PCA in Stata Vincenzo Verardi FUNDP (Namur) and ULB (Brussels), Belgium FNRS Associate Researcher.
NUS CS5247 A dimensionality reduction approach to modeling protein flexibility By, By Miguel L. Teodoro, George N. Phillips J* and Lydia E. Kavraki Rice.
A Statistical Anomaly Detection Technique based on Three Different Network Features Yuji Waizumi Tohoku Univ.
Presented By Wanchen Lu 2/25/2013
Traffic Classification through Simple Statistical Fingerprinting M. Crotti, M. Dusi, F. Gringoli, L. Salgarelli ACM SIGCOMM Computer Communication Review,
by B. Zadrozny and C. Elkan
SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.
Xin He, Yashwant Malaiya, Anura P. Jayasumana Kenneth P
Scalable Analysis of Distributed Workflow Traces Daniel K. Gunter and Brian Tierney Distributed Systems Department Lawrence Berkeley National Laboratory.
People Detection in Video Stream Presented By: Engy Foda Supervised By: Dr. Ahmed Darwish Dr. Ihab Talkhan Dr. Salah El Tawil Cairo University Faculty.
Network Anomography Yin Zhang – University of Texas at Austin Zihui Ge and Albert Greenberg – AT&T Labs Matthew Roughan – University of Adelaide IMC 2005.
Factor Analysis Psy 524 Ainsworth. Assumptions Assumes reliable correlations Highly affected by missing data, outlying cases and truncated data Data screening.
1 Distributed Detection of Network-Wide Traffic Anomalies Ling Huang* XuanLong Nguyen* Minos Garofalakis § Joe Hellerstein* Michael Jordan* Anthony Joseph*
Statistical Analysis of Inlining Heuristics in Jikes RVM Jing Yang Department of Computer Science, University of Virginia.
Personal Computers: in the 1990’s Presenter: Michael Verhaart Picture of a computer or Clip Art.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
Applying 3-D Methods to Video for Compression Salih Burak Gokturk Anne Margot Fernandez Aaron March 13, 2002 EE 392J Project Presentation.
Blind Information Processing: Microarray Data Hyejin Kim, Dukhee KimSeungjin Choi Department of Computer Science and Engineering, Department of Chemical.
Mining Anomalies in Network-Wide Flow Data Anukool Lakhina with Mark Crovella and Christophe Diot NANOG35, Oct 23-25, 2005.
Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar.
Bradley Cowie Supervised by Barry Irwin Security and Networks Research Group Department of Computer Science Rhodes University DATA CLASSIFICATION FOR CLASSIFIER.
Reduces time complexity: Less computation Reduces space complexity: Less parameters Simpler models are more robust on small datasets More interpretable;
ASTUTE: Detecting a Different Class of Traffic Anomalies Fernando Silveira 1,2, Christophe Diot 1, Nina Taft 3, Ramesh Govindan 4 1 Technicolor 2 UPMC.
EE515/IS523: Security 101: Think Like an Adversary Evading Anomarly Detection through Variance Injection Attacks on PCA Benjamin I.P. Rubinstein, Blaine.
An Improved Approach For Image Matching Using Principle Component Analysis(PCA An Improved Approach For Image Matching Using Principle Component Analysis(PCA.
Network Anomography Yin Zhang Joint work with Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement.
Detecting Undesirable Insider Behavior Joseph A. Calandrino* Princeton University Steven J. McKinney* North Carolina State University Frederick T. Sheldon.
Network Anomaly Detection Using Autonomous System Flow Aggregates Thienne Johnson 1,2 and Loukas Lazos 1 1 Department of Electrical and Computer Engineering.
Central limit theorem - go to web applet. Correlation maps vs. regression maps PNA is a time series of fluctuations in 500 mb heights PNA = 0.25 *
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Big Data Quality Panel Norman Paton University of Manchester.
Martina Uray Heinz Mayer Joanneum Research Graz Institute of Digital Image Processing Horst Bischof Graz University of Technology Institute for Computer.
Distributed Network Monitoring in the Wisconsin Advanced Internet Lab Paul Barford Computer Science Department University of Wisconsin – Madison Spring,
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Experience Report: System Log Analysis for Anomaly Detection
Exploratory Factor Analysis
Face Recognition and Feature Subspaces
Detecting Targeted Attacks Using Shadow Honeypots
by Xiang Mao and Qin Chen
○ Hisashi Shimosaka (Doshisha University)
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford 1, Christophe Diot 2 1 Princeton University, 2 Thomson Research

2 Outline Context Background and motivation Bigger picture PCA (subspace method) in one slide Challenges with current PCA methodology Conclusion & future directions

3 Background Promising applications of PCA to AD [Lakhina et al, SIGCOMM 04 & 05] But we weren’t nearly as successful applying technique to a new data set Same source code What were we doing wrong? Unable to tune the technique

4 Bigger Picture Many statistical techniques evaluated for AD e.g., Wavelets, PCA, Kalman filters Promising early results But questions about performance remain What did the researchers have to do in order to achieve presented results?

5 Questions about techniques “Tunability” of technique Number of parameters Sensitivity to parameters Interpretability of parameters Other aspects of robustness Sensitivity to drift in underlying data Sensitivity to sampling Assumptions about the underlying data

6 Principal Components Analysis (PCA) PCA transforms data into new coordinate system Principal components (new bases) ordered by captured variance The first k (top k ) tend to capture periodic trends normal subspace vs. anomalous subspace

7 Data used Géant and Abilene networks IP flow traces 21/11 through 28/ Detected anomalies were manually inspected

8 Outline Context Challenges with current PCA methodology Sensitivity to its parameters Contamination of normalcy Identifying the location of detected anomalies Conclusion & future directions

9 Sensitivity to top k Where is the line drawn between normal and anomalous? What is too anomalous?

10 Sensitivity to top k Very sensitive to top k Total detections and FP Not an issue if top k were tunable Tried many methods 3σ deviation heuristic Cattell’s Scree Test Humphrey-Ilgen Kaiser’s Criterion None are reliable

11 Contamination of normalcy Large anomalies may be included among top k Invalidates assumption that top PCs are periodic Pollutes definition of normal In our study, the outage to the left affected 75/77 links Only detected on a handful!

12 Conclusion & future directions PCA (subspace method) methodology issues Sensitivity to top k parameter Contamination of normal subspace Identifying the location of detected anomalies Generally: room for rigorous evaluation of statistical techniques applied to AD Tunability, robustness Assumptions about underlying data Under what conditions does method excel?

Thanks! Questions? Haakon Ringberg Princeton University Computer Science

14 Identifying anomaly locations Spikes when state vector projected on anomaly subspace But network operators don’t care about this They want to know where it happened! How do we find the original location of the anomaly?

15 Identifying anomaly locations Previous work used a simple heuristic Associate detected spike with k flows with the largest contribution to the state vector v No clear a priori reason for this association