Robust Network Compressive Sensing

Slides:



Advertisements
Similar presentations
BiG-Align: Fast Bipartite Graph Alignment
Advertisements

A Hierarchical Multiple Target Tracking Algorithm for Sensor Networks Songhwai Oh and Shankar Sastry EECS, Berkeley Nest Retreat, Jan
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Clustering V. Outline Validating clustering results Randomization tests.
Exploiting Sparse Markov and Covariance Structure in Multiresolution Models Presenter: Zhe Chen ECE / CMR Tennessee Technological University October 22,
More MR Fingerprinting
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Robust Network Compressive Sensing Lili Qiu UT Austin NSF Workshop Nov. 12, 2014.
Kuang-Hao Liu et al Presented by Xin Che 11/18/09.
Network Architecture for Joint Failure Recovery and Traffic Engineering Martin Suchara in collaboration with: D. Xu, R. Doverspike, D. Johnson and J. Rexford.
1 In-Network PCA and Anomaly Detection Ling Huang* XuanLong Nguyen* Minos Garofalakis § Michael Jordan* Anthony Joseph* Nina Taft § *UC Berkeley § Intel.
Traffic Engineering With Traditional IP Routing Protocols
NetQuest: A Flexible Framework for Internet Measurement Lili Qiu Joint work with Mike Dahlin, Harrick Vin, and Yin Zhang UT Austin.
NORM BASED APPROACHES FOR AUTOMATIC TUNING OF MODEL BASED PREDICTIVE CONTROL Pastora Vega, Mario Francisco, Eladio Sanz University of Salamanca – Spain.
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
Goals of Adaptive Signal Processing Design algorithms that learn from training data Algorithms must have good properties: attain good solutions, simple.
Rethinking Internet Traffic Management: From Multiple Decompositions to a Practical Protocol Jiayue He Princeton University Joint work with Martin Suchara,
Dream Slides Courtesy of Minlan Yu (USC) 1. Challenges in Flow-based Measurement 2 Controller Configure resources1Fetch statistics2(Re)Configure resources1.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
How to Turn on The Coding in MANETs Chris Ng, Minkyu Kim, Muriel Medard, Wonsik Kim, Una-May O’Reilly, Varun Aggarwal, Chang Wook Ahn, Michelle Effros.
RACE: Time Series Compression with Rate Adaptivity and Error Bound for Sensor Networks Huamin Chen, Jian Li, and Prasant Mohapatra Presenter: Jian Li.
Tomo-gravity Yin ZhangMatthew Roughan Nick DuffieldAlbert Greenberg “A Northern NJ Research Lab” ACM.
DaVinci: Dynamically Adaptive Virtual Networks for a Customized Internet Jennifer Rexford Princeton University With Jiayue He, Rui Zhang-Shen, Ying Li,
1 Meeyoung Cha, Sue Moon, Chong-Dae Park Aman Shaikh Placing Relay Nodes for Intra-Domain Path Diversity To appear in IEEE INFOCOM 2006.
1 Hybrid methods for solving large-scale parameter estimation problems Carlos A. Quintero 1 Miguel Argáez 1 Hector Klie 2 Leticia Velázquez 1 Mary Wheeler.
1 Unveiling Anomalies in Large-scale Networks via Sparsity and Low Rank Morteza Mardani, Gonzalo Mateos and Georgios Giannakis ECE Department, University.
Shannon Lab 1AT&T – Research Traffic Engineering with Estimated Traffic Matrices Matthew Roughan Mikkel Thorup
Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.
Adaptive CSMA under the SINR Model: Fast convergence using the Bethe Approximation Krishna Jagannathan IIT Madras (Joint work with) Peruru Subrahmanya.
Spatio-Temporal Compressive Sensing Yin Zhang The University of Texas at Austin Joint work with Matthew Roughan.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
Event Detection using Customer Care Calls 04/17/2013 IEEE INFOCOM 2013 Yi-Chao Chen 1, Gene Moo Lee 1, Nick Duffield 2, Lili Qiu 1, Jia Wang 2 The University.
June 21, 2007 Minimum Interference Channel Assignment in Multi-Radio Wireless Mesh Networks Anand Prabhu Subramanian, Himanshu Gupta.
Low-rank By: Yanglet Date: 2012/12/2. Included Works. Yin Zhang, Lili Qiu ―Spatio-Temporal Compressive Sensing and Internet Traffic Matrices, SIGCOMM.
Statistical Sampling-Based Parametric Analysis of Power Grids Dr. Peng Li Presented by Xueqian Zhao EE5970 Seminar.
Network Anomography Yin Zhang – University of Texas at Austin Zihui Ge and Albert Greenberg – AT&T Labs Matthew Roughan – University of Adelaide IMC 2005.
Energy-Efficient Signal Processing and Communication Algorithms for Scalable Distributed Fusion.
Efficient computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L 1 Norm Anders Eriksson and Anton van den Hengel.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
Robust Object Tracking by Hierarchical Association of Detection Responses Present by fakewen.
Simultaneous routing and resource allocation via dual decomposition AUTHOR: Lin Xiao, Student Member, IEEE, Mikael Johansson, Member, IEEE, and Stephen.
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
6 December On Selfish Routing in Internet-like Environments paper by Lili Qiu, Yang Richard Yang, Yin Zhang, Scott Shenker presentation by Ed Spitznagel.
Multi-area Nonlinear State Estimation using Distributed Semidefinite Programming Hao Zhu October 15, 2012 Acknowledgements: Prof. G.
On Selfish Routing In Internet-like Environments Lili Qiu (Microsoft Research) Yang Richard Yang (Yale University) Yin Zhang (AT&T Labs – Research) Scott.
NONNEGATIVE MATRIX FACTORIZATION WITH MATRIX EXPONENTIATION Siwei Lyu ICASSP 2010 Presenter : 張庭豪.
Large-Scale Matrix Factorization with Missing Data under Additional Constraints Kaushik Mitra University of Maryland, College Park, MD Sameer Sheoreyy.
NetQuest: A Flexible Framework for Large-Scale Network Measurement Lili Qiu University of Texas at Austin Joint work with Han Hee Song.
Energy-Efficient Signal Processing and Communication Algorithms for Scalable Distributed Fusion.
Sensitivity of PCA for Traffic Anomaly Detection Evaluating the robustness of current best practices Haakon Ringberg 1, Augustin Soule 2, Jennifer Rexford.
Ultra-high dimensional feature selection Yun Li
Network Anomography Yin Zhang Joint work with Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement.
Facets: Fast Comprehensive Mining of Coevolving High-order Time Series Hanghang TongPing JiYongjie CaiWei FanQing He Joint Work by Presenter:Wei Fan.
SCREAM: Sketch Resource Allocation for Software-defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat (CoNEXT’15)
Optimization-based Cross-Layer Design in Networked Control Systems Jia Bai, Emeka P. Eyisi Yuan Xue and Xenofon D. Koutsoukos.
Accurate WiFi Packet Delivery Rate Estimation and Applications Owais Khan and Lili Qiu. The University of Texas at Austin 1 Infocom 2016, San Francisco.
Ranking: Compare, Don’t Score Ammar Ammar, Devavrat Shah (LIDS – MIT) Poster ( No preprint), WIDS 2011.
SketchVisor: Robust Network Measurement for Software Packet Processing
Mathieu Leconte, Ioannis Steiakogiannakis, Georgios Paschos
Zhu Han University of Houston Thanks for Dr. Mingyi Hong’s slides
Computing and Compressive Sensing in Wireless Sensor Networks
Event Detection using Customer Care Calls
Predictable performance optimization for wireless networks
Asymmetric Correlation Regularized Matrix Factorization for Web Service Recommendation Qi Xie1, Shenglin Zhao2, Zibin Zheng3, Jieming Zhu2 and Michael.
Q4 : How does Netflix recommend movies?
Distributed Channel Assignment in Multi-Radio Mesh Networks
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Keystroke Recognition using Wi-Fi Signals
Kyoungwoo Lee, Minyoung Kim, Nikil Dutt, and Nalini Venkatasubramanian
Mathieu Leconte, Ioannis Steiakogiannakis, Georgios Paschos
Presentation transcript:

Robust Network Compressive Sensing Yi-Chao Chen The University of Texas at Austin Joint work with Lili Qiu*, Yin Zhang*, Guangtao Xue+, Zhenxian Hu+ The University of Texas at Austin* Shanghai Jiao Tong University+ ACM MobiCom 2014 September 10, 2014

Network Matrices and Applications Traffic matrix Loss matrix Delay matrix Channel State Information (CSI) matrix RSS matrix

Q: How to fill in missing values in a matrix?

Missing Values: Why Bother? Applications need complete network matrices Traffic engineering Spectrum sensing Channel estimation Localization Multi-access channel design Network coding, wireless video coding Anomaly detection Data aggregation … Vacant freq 1 freq 2 freq 3 time 1 time 2 … 1 3 2 router flow 1 flow 3 flow 2 link 2 link 1 link 3 time 1 time 2 … subcarrier 1 subcarrier 2 subcarrier 3 time 1 time 2 … Network information creates exciting opportunities for network analytic and has wide range of applications in networks across all protocol layers. Example applications include: traffic engineering. with traffic matrix between all source and destination pairs, it can be used to provision traffic and avoid congestion. Spectrum sensing. With the power information across all frequency, we can find the best band to use. Channel estimation. With SNR information across all OFDM subcarriers, it can be used to computer effective SNR and used for wireless rate adaptation. ================ Channel selection requires knowledge of received signal strength (RSS) information across all channels in order to select the best channel.

The Problem x1,3 Missing Anomaly Future Interpolation: fill in missing values from incomplete, erroneous, and/or indirect measurements

State of the Art Exploit low-rank nature of network matrices matrices are low-rank [LPCD+04, LCD04, SIGCOMM09]: Xnxm  Lnxr * RmxrT (r « n,m) Exploit spatio-temporal properties matrix rows or columns close to each other are often close in value [SIGCOMM09] Exploit local structures in network matrices matrices have both global & local structures Apply K-Nearest Neighbor (KNN) for local interpolation [SIGCOMM09] SVD and XXX method

Limitation Many factors contribute to network matrices Anomalies, measurement errors, and noise These factors may destroy low-rank structure and spatio-temporal locality

Network Matrices Network Date Duration Size (flows/links x #timeslot) 3G traffic 11/2010 1 day 472 x 144 WiFi traffic 1/2013 50 x 118 Abilene traffic 4/2003 1 week 121 x 1008 GEANT traffic 4/2005 529 x 672 1 channel CSI 2/2009 15 min. 90 x 9000 Multi. Channel CSI 2/2014 270 x 5000 Cister RSSI 4 hours 16 x 10000 CU RSSI 8/2007 500 frames 895 x 500 Umich RSS 4/2006 30 min. 182 x 3127 UCSB Meshnet 3 days 425 x 1527

Rank Analysis Not all matrices are low rank. GÉANT: 81% UMich RSS: 74% Cister RSSI: 20%

Adding anomalies increases rank in all traces Rank Analysis (Cont.) Adding anomalies increases rank in all traces 13%~50%

Temporal Stability Temporal stability varies across traces. UCSB Meshnet: 0.3% UMich RSS: 0.8% 3G: 10% Cister RSSI: 6% CDF

Summary of the Analyses Our analyses reveal Real network matrices may not be low rank Adding anomalies increases the rank Temporal stability varies across traces Just to recap, here is the summary for our observations:

Challenges How to explicitly account for anomalies, errors, and noise ? How to support matrices with different temporal stability?

Robust Compressive Sensing A new matrix decomposition that is general and robust against error/anomalies Low rank matrix, anomaly matrix, noise matrix A self-learning algorithm to automatically tune the parameters Account for varying temporal stability An efficient optimization algorithm Search for the best parameters Work for large network matrices

LENS Decomposition: Basic Formulation … d2,n dm,n d3,n d1,4 d1,n d2,1 d2,2 d2,3 d3,1 d3,2 d3,4 dm,2 dm,3 dm,4 … xm,r xr,n x1,1 x2,1 x3,1 xm,1 x3,r x2,r x1,r xr,1 xr,2 x1,2 xr,3 x1,3 x1,n = + [Input] D: Original matrix [Output] X: A low rank matrix (r « m,n) y1,3 … y3,n … Given a network matrix with missing values and anomalies as the input, we want to decompose it into 3 compoents. Low rank because a few variables are enough to represent the matrix. for example, in traffic matrix we can use gravility model to in RSSI, the signal attenuated follow some propagation model + [Output] Y: A sparse anomaly matrix [Output] Z: A small noise matrix

LENS Decomposition: Basic Formulation Formulate it as a convex opt. problem: d1,3 d1,2 d1,4 [Input] D: Original matrix x1,2 x1,4 [Output] X: A low rank matrix y1,3 [Output] Y: A sparse anomaly matrix [Output] Z: A small noise matrix = + + min: α β σ subject to:

LENS Decomposition: Support Indirect Measurement The matrix of interest may not be directly observable (e.g., traffic matrices) AX + BY + CZ + W = D A: routing matrix B: an over-complete anomaly profile matrix C: noise profile matrix 1 3 2 router flow 1 flow 3 flow 2 link 2 link 1 link 3 Can support the need for indirect measurement

LENS Decomposition: Account for Domain Knowledge Temporal stability Spatial locality Initial solution min: Don’t explain each domain knowledge subject to:

Optimization Algorithm One of many challenges in optimization: X and Y appear in multiple places in the objective and constraints Coupling makes optimization hard Reformulation for optimization by introducing auxiliary variables min: subject to:

Optimization Algorithm Alternating Direction Method (ADM) For each iteration, alternate among the optimization of the augmented Lagrangian function by varying each one of X, Xk, Y, Y0, Z, W, M, Mk, N while fixing the other variables Improve efficiency through approximate SVD

Setting Parameters where (mX,nX) is the size of X, (mY,nY) is the size of Y, η(D) is the fraction of entries neither missing or erroneous, θ is a control parameter that limits contamination of dense measurement noise min: α β σ σ Refer to our paper to see the details how to

Setting Parameters (Cont.) ϒ reflects the importance of domain knowledge e.g. temporal-stability varies across traces Self-tuning algorithm Drop additional entries in the matrix Quantify the error of the entries that were present in the matrix but dropped intentionally during the search Pick ϒ that gives lowest error min: γ σ

Evaluation Methodology Metric Normalized Mean Absolute Error for missing values Report the average of 10 random runs Anomaly generation Inject anomalies to a varying fraction of entries with varying sizes Different dropping models We use xxx for missing value. NMAE is a standard metric which capture the estimation error

Algorithms Compared Algorithm Description Baseline SVD-base Baseline estimate via rank-2 approximation SVD-base SRSVD with baseline removal SVD-base +KNN Apply KNN after SVD-base SRMF [SIGCOMM09] Sparsity Regularized Matrix Factorization SRMF+KNN [SIGCOMM09] Hybrid of SRMF and KNN LENS Robust network compressive sensing

Self Learned ϒ Best ϒ = 0 Best ϒ = 1 Best ϒ = 10 No single ϒ works for all traces. Self tuning allows us to automatically select the best ϒ.

Interpolation under anomalies CU RSSI LENS performs the best under anomalies.

Interpolation without anomalies CU RSSI LENS performs the best even without anomalies.

LENS is efficient due to local optimization in ADM. Computation Time LENS is efficient due to local optimization in ADM.

Summary of Other Results The improvement of LENS increases with anomaly sizes and # anomalies. LENS consistently performs the best under different dropping modes. LENS yields the lowest prediction error. LENS achieves higher anomaly detection accuracy.

Conclusion Main contributions Future work Important impact of anomalies in matrix interpolation Decompose a matrix into a low-rank matrix, a sparse anomaly matrix, a dense but small noise matrix An efficient optimization algorithm A self-learning algorithm to automatically tune the parameters Future work Applying it to spectrum sensing, channel estimation, localization, etc. The major finding is that anomalies has great impact to matrix ineterpolation and We are the first one the address the problem and propose a robust method which

Thank you!

Backup Slides

Missing Values: Why Bother? Missing values are common in network matrices Measurement and data collection are unreliable Anomalies/outliers hide non-anomaly-related traffic Future entries has not yet appeared Direct measurement is infeasible/expensive y1,t = x1,t + x2,t Missing x2,4 Anomaly Future 3 2 1 flow 1 flow 2 flow 3 xf,t : traffic of flow f at time t ? More explanation to direct measurement

Anomaly Generation Anomalies in real-world network matrices The ground truth is hard to get Injecting the same size of anomalies for all network matrices is not practical Generating anomalies based on the nature of the network matrix [SIGCOMM04, INFOCOM07, PETS11] Apply Exponential Weighted Moving Average (EWMA) to predict the matrix Calculate the difference between the real matrix and the predicted matrix The difference is due to prediction error or anomalies Sort the difference and select the largest one as the size of anomalies

Interpolation under anomalies 3G LENS performs the best under anomalies.

Interpolation without anomalies 3G LENS performs the best even without anomalies.

Non-monotonicity of Performance SVD base method is more sensitive the anomalies performance is affected by both the missing rate and the number of anomalies. As the missing rate increases, the number of anomalies reduces. When the missing rate increase, error increases When the anomalies reduces, error reduces

… Dropping Models d1,2 d2,n dm,n d1,4 d1,n d2,1 d2,2 d2,3 d3,1 d3,2 Pure Random Loss Elements are dropped independently with a random loss rate d1,2 … d2,n dm,n d1,4 d1,n d2,1 d2,2 d2,3 d3,1 d3,2 d3,4 dm,2 dm,3 dm,4 d1,1 d1,3 d2,4 d3,3 d3,n dm,1

… Dropping Models d1,2 d2,n dm,n d1,4 d1,n d2,1 d2,2 d2,3 d3,1 d3,2 Time Rand Loss Columns are dropped To emulate random losses during certain times e.g. disk becomes full d1,2 … d2,n dm,n d1,4 d1,n d2,1 d2,2 d2,3 d3,1 d3,2 d3,4 dm,2 dm,3 dm,4 d1,1 d1,3 d2,4 d3,3 d3,n dm,1

… Dropping Models d1,2 d2,n dm,n d1,4 d1,n d2,1 d2,2 d2,3 d3,1 d3,2 Element Rand Loss Rows are dropped To emulate certain nodes lose data e.g., due to battery drain d1,2 … d2,n dm,n d1,4 d1,n d2,1 d2,2 d2,3 d3,1 d3,2 d3,4 dm,2 dm,3 dm,4 d1,1 d1,3 d2,4 d3,3 d3,n dm,1

… Dropping Models d1,2 d2,n dm,n d1,4 d1,n d2,1 d2,2 d2,3 d3,1 d3,2 Element Synchronized Loss Rows are dropped at the same time To emulate certain nodes experience the same lose events at the same time e.g., power outage of an area d1,2 … d2,n dm,n d1,4 d1,n d2,1 d2,2 d2,3 d3,1 d3,2 d3,4 dm,2 dm,3 dm,4 d1,1 d1,3 d2,4 d3,3 d3,n dm,1

Impact of Dropping models TimeRandLoss ElemRandLoss LENS consistently performs the best under different dropping modes. ElemSyncLoss RowRandLoss PureRandLoss: elements in a matrix are dropped independently with a random loss rate; xxTimeRandLoss: xx% of columns in a matrix are selected and the elements in these selected columns are dropped with a probability $p$ to emulate random losses during certain times (\eg, disk becomes full); (iii) xxElemRandLoss: xx% of rows in a matrix are selected and the elements in these selected rows are dropped with a probability $p$ to emulate certain nodes lose data (\eg, due to battery drain); (iv) xxElemSyncLoss: the intersection of xx\% of rows and p\% of columns in a matrix are dropped to emulate a group of nodes experience the same loss events at the same time; (v) RowRandLoss: drop random rows to emulate node failures, and (vi) ColRandLoss: drop random columns for completeness. We use PureRandLoss as the default, and further use other loss models to understand impacts of different loss models. We feed the matrices after dropping as the input to LENS, and use LENS to fill in the missing entries.

Impact of Anomaly Sizes 3G The improvement of LENS increases with anomaly sizes.

Impact of Number of Anomalies 3G The improvement of LENS increases with # anomalies.

Impact of Noise CU RSSI

LENS yields the lowest prediction error. 3G LENS yields the lowest prediction error.

LENS achieves higher anomaly detection accuracy.

Computation Time Fig. 14

Optimization Algorithm Alternating Direction Method (ADM) Augmented Lagrangian function Original Objective Lagrange multiplier Penalty

Convergency