Threshold Compression for 3G Scalable Monitoring 1 Suk-Bok Lee, Dan Pei, MohammadTaghi Hajiaghayi, Ioannis Pefkianakis Songwu Lu, He Yan, Zihui Ge, Jennifer.

Slides:



Advertisements
Similar presentations
QoS-based Management of Multiple Shared Resources in Dynamic Real-Time Systems Klaus Ecker, Frank Drews School of EECS, Ohio University, Athens, OH {ecker,
Advertisements

CrowdER - Crowdsourcing Entity Resolution
Viral Marketing – Learning Influence Probabilities.
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.
Label Placement and graph drawing Imo Lieberwerth.
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Train DEPOT PROBLEM USING PERMUTATION GRAPHS
Identifying Early Buyers from Purchase Data Paat Rusmevichientong, Shenghuo Zhu & David Selinger Presented by: Vinita Shinde Feb 18 th, 2010.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Self-Correlating Predictive Information Tracking for Large-Scale Production Systems Zhao, Tan, Gong, Gu, Wambolt Presented by: Andrew Hahn.
Decision Tree Algorithm
Computing Trust in Social Networks
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
[1][1][1][1] Lecture 2-3: Coping with NP-Hardness of Optimization Problems in Practice May 26 + June 1, Introduction to Algorithmic Wireless.
Minimum Maximum Degree Publish-Subscribe Overlay Network Design Melih Onus TOBB Ekonomi ve Teknoloji Üniversitesi, 28 Mayıs 2009.
Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research
Graphs and Topology Yao Zhao. Background of Graph A graph is a pair G =(V,E) –Undirected graph and directed graph –Weighted graph and unweighted graph.
Scalable Network Distance Browsing in Spatial Database Samet, H., Sankaranarayanan, J., and Alborzi H. Proceedings of the 2008 ACM SIGMOD international.
Simpath: An Efficient Algorithm for Influence Maximization under Linear Threshold Model Amit Goyal Wei Lu Laks V. S. Lakshmanan University of British Columbia.
New Challenges in Cloud Datacenter Monitoring and Management
H-1 Network Management Network management is the process of controlling a complex data network to maximize its efficiency and productivity The overall.
TOWARDS IDENTITY ANONYMIZATION ON GRAPHS. INTRODUCTION.
Tomer Sagi and Avigdor Gal Technion - Israel Institute of Technology Non-binary Evaluation for Schema Matching ER 2012 October 2012, Florence.
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
Toshihide IBARAKI Mikio KUBO Tomoyasu MASUDA Takeaki UNO Mutsunori YAGIURA Effective Local Search Algorithms for the Vehicle Routing Problem with General.
Special Topics on Algorithmic Aspects of Wireless Networking Donghyun (David) Kim Department of Mathematics and Computer Science North Carolina Central.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Surface Simplification Using Quadric Error Metrics Michael Garland Paul S. Heckbert.
Outlier Detection Using k-Nearest Neighbour Graph Ville Hautamäki, Ismo Kärkkäinen and Pasi Fränti Department of Computer Science University of Joensuu,
Network Aware Resource Allocation in Distributed Clouds.
Topology aggregation and Multi-constraint QoS routing Presented by Almas Ansari.
A Distributed Clustering Framework for MANETS Mohit Garg, IIT Bombay RK Shyamasundar School of Tech. & Computer Science Tata Institute of Fundamental Research.
Event Detection using Customer Care Calls 04/17/2013 IEEE INFOCOM 2013 Yi-Chao Chen 1, Gene Moo Lee 1, Nick Duffield 2, Lili Qiu 1, Jia Wang 2 The University.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
A Graph-based Friend Recommendation System Using Genetic Algorithm
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Towards Efficient Large-Scale VPN Monitoring and Diagnosis under Operational Constraints Yao Zhao, Zhaosheng Zhu, Yan Chen, Northwestern University Dan.
Lecture 19 Greedy Algorithms Minimum Spanning Tree Problem.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Network Community Behavior to Infer Human Activities.
CS425: Algorithms for Web Scale Data Most of the slides are from the Mining of Massive Datasets book. These slides have been modified for CS425. The original.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:
1 An Analytical Model for the Dimensioning of a GPRS/EDGE Network with a Capacity Constraint on a Group of Cells r , r , r Nogueira,
Graph Indexing From managing and mining graph data.
Spanning Tree Method for Link State Aggregation in Large Communication Networks Whay Choiu Lee.
Biao Wang 1, Ge Chen 1, Luoyi Fu 1, Li Song 1, Xinbing Wang 1, Xue Liu 2 1 Shanghai Jiao Tong University 2 McGill University
Optimization-based Cross-Layer Design in Networked Control Systems Jia Bai, Emeka P. Eyisi Yuan Xue and Xenofon D. Koutsoukos.
1 Creating Situational Awareness with Data Trending and Monitoring Zhenping Li, J.P. Douglas, and Ken. Mitchell Arctic Slope Technical Services.
-1/16- Maximum Battery Life Routing to Support Ubiquitous Mobile Computing in Wireless Ad Hoc Networks C.-K. Toh, Georgia Institute of Technology IEEE.
A K-Main Routes Approach to Spatial Network Activity Summarization(SNAS) Group 8.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
The Biologically Inspired Distributed File System: An Emergent Thinker Instantiation Presented by Dr. Ying Lu.
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
International Conference on Data Engineering (ICDE 2016)
A Study of Group-Tree Matching in Large Scale Group Communications
Presented by: Dr Beatriz de la Iglesia
Threshold Compression for 3G Scalable Monitoring
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
15th Scandinavian Workshop on Algorithm Theory
Chrysostomos Koutsimanis and G´abor Fodor
Presentation transcript:

Threshold Compression for 3G Scalable Monitoring 1 Suk-Bok Lee, Dan Pei, MohammadTaghi Hajiaghayi, Ioannis Pefkianakis Songwu Lu, He Yan, Zihui Ge, Jennifer Yates, Mario Kosseifi CMU AT&T Labs-Research UCLA U of Maryland AT&T Network Services

What’s unique challenges on 3G network monitoring? 2 A large number of network elements (NEs) E.g. several thousands cell-sites in a single market area Different-types of NEs: GGSN - SGSN - RNC - NodeB – Sector Various KPIs (key performance metrics) Dynamics on measurement results Both in time and spatial domains Reflecting: Mobile user’s daily 3G usage pattens Cell-site physical location & network topology

Naïve threshold-based alarming model is not scalable to large 3G networks 3 Single static threshold (across locations): poor alarm quality Fine-grained threshold (location & time specific): management complexity

Possible thresholding schemes with different monitoring granularity 4 1.Per-NE-hourly (fine-grained location & time dependent) Each NE has its own hourly thresholds 2.Per-NE-static Each NE has a single (aggregating all hours) threshold 3.Per-NEtype-hourly Every NE shares the same hourly (aggregating all NEs) thresholds 4.Per-NEtype-static A single threshold (aggregating all hours and all NEs)

None of them are scalable to 3G monitoring 5 Per-NE-hourly ideal for capturing dynamic 3G characteristics threshold size per KPI grows very large with network size Aggregate-based threshold schemes small threshold settings high FPR (false positive rate) and FNR (false negative rate) Thresholding on DL-throughput in a single area (2010/ /10)

Fundamental tradeoff: threshold setting vs. alarm quality 6 Fine-grained spatial-temporal thresholds Pros: good alarm quality Capture well each NE’s location and time specific behavior Cons: large # of thresholds, management complexity E.g. a single area has >30,000 thresholds per KPI Aggregate-based thresholds Pros: a single threshold value for all NEs and hours low system management overhead Cons: poor alarm quality E.g. can be observed ~70% false positive/false negatives Can we have both advantages (small threshold settings and good alarm quality) in a large 3G network?

Our solution: threshold compression 7 Intelligent threshold aggregation Observation 1: Some group of NEs show similar threshold behaviors  threshold aggregation via NE grouping Observation 2: Certain group of hours show similar threshold behaviors  threshold aggregation via hourly grouping Our threshold-compression characterizes the location- and time-specific threshold trend of each NE with minimal threshold setting Maintains acceptable alarm accuracy

Observation of similar threshold behavior 8 Spatial-domain similarity Geographic locations & user population around NEs Time-domain similarity Daily trend of 3G usage pattern

Desirable properties of the solution 9 1.High compression gain Small threshold setting even with large number of NEs 2.Low false alarm rate Enforced by two input parameters α and β Applying α and β to historical data  permissible interval 3.Management-oriented grouping Each NE belongs to only one NE group, but multiple hour groups within an NE group  two-level hierarchical clustering

Threshold compression problem formulation 10 Objective function Find the minimum number of spatial-temporal clusters from a given fine-grained threshold setting (i.e. per-NE-hourly) Constraints 1.Each compressed (aggregated) threshold must be within the permissible threshold interval of each spatial-temporal block which it represents to 2.NE grouping must be consistent across time Hardness result This problem is not only NP-hard, but indeed inapproximable as well

Threshold compression algorithm: two-staged approach 11 1.Spatial NE grouping Identifies NE groups each showing similar threshold behavior each hour among its members Each NE group consists of 24 hour-groups 2.Temporal-domain clustering within each NE group Takes the NE grouping result as input to perform hour grouping for each identified NE group Strategy for clustering Combine spatial-temporal blocks if they 1.have common intersection in their permissible intervals 2.Meet the consistent NE grouping rule

NE grouping: greedy coloring approach 12 1.Convert to graph Each NE  vertex Put edge between two NEs, if they have disjoint permissible interval in any hour 2.Apply graph coloring Minimum number of colors (NE groups) assignable to each vertex (NE) such that no edge (common intersection) connects two identically colored vertices (NE group members) We apply the Welsh-Powell coloring algorithm that uses at most one more than the maximum degree of the graph

Hour grouping: minimum cover selection 13 Convert to intervals Each hour  its threshold (permissible) interval Do minimum cover Find the minimum number of interval groups such that there is common intersection in each interval group 1.Sort all the interval endpoints 2.Scan until first encountering an upperbound point 3.Put all intervals containing this point in to a new interval group 4.Repeat from step 2

Evaluation: compression gain & alarm quality 14 Within desired 10% false/miss alarm*, nearly 70-90% compression gain *In this study, we use slightly different definition of FPR = FP/(FP+TP) and FNR = FN/(FN+TP), to adapt them to the context where TP is much smaller than TN Compression gain on various KPIs (Training dataset 2010/06 – 2010/08)

Evaluation: compression gain & alarm quality by tuning input parameters 15 These give us a clear idea of how α and β should be chosen E.g., setting α=0.03 and β=0.04 meets the target FPR (<15%) and FNR (<10%), which leads to compression gain of 82% DL-throughput KPIs

Validation: operational experience 16 The resulting FPR and FNR are within our target 10-15% Validation results on various KPIs (Applying the compressed threshold setting to real data 2010/08 – 2010/10)

Validation: clustering stability over time 17 All KPIs show above 70% consistency  robustness of the solution Similar behavior across locations are consistent over time Members in each identified cluster behave very closely one another across time, just like one single entity  key idea of our solution Spatial-temporal clustering consistency between training data and monitoring data on different KPIs

Conclusion 18 3G monitoring is challenging due to its large scale and strong dynamics in both in time and spatial domains Tradeoff: threshold setting vs. alarm quality We propose an intelligent threshold aggregation solution Characterizes the location- and time-specific threshold trend of each individual NE with minimal threshold setting Operational experience with applying our solution has been very positive Threshold setting reduction up to 90% with less than 10% false/miss alarm rates

Backup slides 19

Common practice for monitoring for a large-scale network 20 Pre-defined (compute offline) thresholds is preferable Why not use a more sophisticated realtime-based dynamic thresholding? (e.g. exponential smoothing, regression analysis) If applied to each individual node in the network, it will create excessive computational burden on the monitoring system.

Pre-computing thresholds 21 First pass: remove anomalies. Holt-Winters algorithm taking into consideration diurnal, weekly pattern etc and individual network elements Second pass: compute thresholds. compute the mean and standard deviation based on the data without anomalies, and then compute thresholds: Yellow threshold: (mean – std) for dip KPIs (e.g. throughput), (mean + std) for spike KPIs (e.g. loss) Red threshold: (mean – 2*std) for dip KPIs (e.g. throughput), (mean + 2*std) for spike KPIs (e.g. loss)

Observation of similar threshold behavior (Optima KPI: RNC CPU load) 22 NRCSGAJTCR0R03:ATLNGAUYRNC001|9:10:11:12:13:14:15|14|64.56 Previously 14 thresholds can become one threshold Grouping across NEsacross hours RNC1: NRCSGAJTCR0R03 RNC2: ATLNGAUYRNC001 These two RNCs are under the same SGSN…

Overall picture 23 For each KPI, the algorithm outputs the compressed thresholds with NE & hour grouping results Threshold compress algorithm Dynamic thresholds (per-NE hourly) Input Compressed thresholds (with NE&hour Grouping results in XML format) Output Threshold Closeness (tunable parameter) KPI data Threshold computation (HW) remove anomalies Compute Thresholds dip:mean-2*std spike:mean+2*std Output format: 1) Consistent NE grouping over hours 2) Three hourly-groups per NE group ( RNC 1, RNC 2, RNC 3) ; (0am ~ 8 am) : thld 1 (RNC 1, RNC 2, RNC 3) ; (9am ~ 5 pm) : thld 2 (RNC 1, RNC 2, RNC 3) ; (6am ~ 12 pm) : thld 2 (RNC 6, RNC 8, RNC 9) ; (0am ~ 7am) : thld 3 (RNC 6, RNC 8, RNC 9) ; (8am ~ 7pm) : thld 4 (RNC 6, RNC 8, RNC 9) ; (8pm ~ 12pm) : thld 4