Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.

Slides:



Advertisements
Similar presentations
Monitoring very high speed links Gianluca Iannaccone Sprint ATL joint work with: Christophe Diot – Sprint ATL Ian Graham – University of Waikato Nick McKeown.
Advertisements

IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
New Directions in Traffic Measurement and Accounting Cristian Estan – UCSD George Varghese - UCSD Reviewed by Michela Becchi Discussion Leaders Andrew.
Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College.
1 CONGESTION CONTROL. 2 Congestion Control When one part of the subnet (e.g. one or more routers in an area) becomes overloaded, congestion results. Because.
A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,
Estimating TCP Latency Approximately with Passive Measurements Sriharsha Gangam, Jaideep Chandrashekar, Ítalo Cunha, Jim Kurose.
Detecting DDoS Attacks on ISP Networks Ashwin Bharambe Carnegie Mellon University Joint work with: Aditya Akella, Mike Reiter and Srinivasan Seshan.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
Marios Iliofotou (UC Riverside) Brian Gallagher (LLNL)Tina Eliassi-Rad (Rutgers University) Guowu Xi (UC Riverside)Michalis Faloutsos (UC Riverside) ACM.
Topology Control for Effective Interference Cancellation in Multi-User MIMO Networks E. Gelal, K. Pelechrinis, T.S. Kim, I. Broustis Srikanth V. Krishnamurthy,
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
Resilient Peer-to-Peer Streaming Paper by: Venkata N. Padmanabhan Helen J. Wang Philip A. Chou Discussion Leader: Manfred Georg Presented by: Christoph.
The structure of the Internet. How are routers connected? Why should we care? –While communication protocols will work correctly on ANY topology –….they.
Volkan Cevher, Marco F. Duarte, and Richard G. Baraniuk European Signal Processing Conference 2008.
1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.
Polytechnic University,ECE Department1 Detection of “Hot Spots” Paper Title : Joint Data Streaming and Sampling Techniques for Detection of Super Sources.
Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1, Zhichun Li 1, Yan Chen 1, Yan Gao 1, Ashish.
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
1 Distributed Online Simultaneous Fault Detection for Multiple Sensors Ram Rajagopal, Xuanlong Nguyen, Sinem Ergen, Pravin Varaiya EECS, University of.
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
Measurement and Monitoring Nick Feamster Georgia Tech.
Rethinking Internet Traffic Management: From Multiple Decompositions to a Practical Protocol Jiayue He Princeton University Joint work with Martin Suchara,
Core Stateless Fair Queueing Stoica, Shanker and Zhang - SIGCOMM 98 Rigorous fair Queueing requires per flow state: too costly in high speed core routers.
Fast and Robust Worm Detection Algorithm Tian Bu Aiyou Chen Scott Vander Wiel Thomas Woo bearhsu.
Core Stateless Fair Queueing Stoica, Shanker and Zhang - SIGCOMM 98 Fair Queueing requires per flow state: too costly in high speed core routers Yet, some.
Tradeoffs in CDN Designs for Throughput Oriented Traffic Minlan Yu University of Southern California 1 Joint work with Wenjie Jiang, Haoyuan Li, and Ion.
George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight.
The Effects of Systemic Packets Loss on Aggregate TCP Flows Thomas J. Hacker May 8, 2002 Internet 2 Member Meeting.
Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.
An algorithm for dynamic spectrum allocation in shadowing environment and with communication constraints Konstantinos Koufos Helsinki University of Technology.
SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.
Tracking with Unreliable Node Sequences Ziguo Zhong, Ting Zhu, Dan Wang and Tian He Computer Science and Engineering, University of Minnesota Infocom 2009.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
DoWitcher: Effective Worm Detection and Containment in the Internet Core S. Ranjan et. al in INFOCOM 2007 Presented by: Sailesh Kumar.
Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience Zhichun Li, Manan Sanghi, Yan Chen, Ming-Yang Kao and Brian.
1 Optical Burst Switching (OBS). 2 Optical Internet IP runs over an all-optical WDM layer –OXCs interconnected by fiber links –IP routers attached to.
ACN: RED paper1 Random Early Detection Gateways for Congestion Avoidance Sally Floyd and Van Jacobson, IEEE Transactions on Networking, Vol.1, No. 4, (Aug.
TOMA: A Viable Solution for Large- Scale Multicast Service Support Li Lao, Jun-Hong Cui, and Mario Gerla UCLA and University of Connecticut Networking.
Large-Scale IP Traceback in High-Speed Internet : Practical Techniques and Theoretical Foundation Jun (Jim) Xu Networking & Telecommunications Group College.
Mapping Internet Sensors with Probe Response Attacks Authors: John Bethencourt, Jason Franklin, Mary Vernon Published At: Usenix Security Symposium, 2005.
Challenges and Opportunities Posed by Power Laws in Network Analysis Bruno Ribeiro UMass Amherst MURI REVIEW MEETING Berkeley, 26 th Oct 2011.
Towards Efficient Large-Scale VPN Monitoring and Diagnosis under Operational Constraints Yao Zhao, Zhaosheng Zhu, Yan Chen, Northwestern University Dan.
CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN.
Detection of Routing Loops and Analysis of Its Causes Sue Moon Dept. of Computer Science KAIST Joint work with Urs Hengartner, Ashwin Sridharan, Richard.
Trajectory Sampling for Direct Traffic Oberservation N.G. Duffield and Matthias Grossglauser IEEE/ACM Transactions on Networking, Vol. 9, No. 3 June 2001.
Mohamed Hefeeda 1 School of Computing Science Simon Fraser University, Canada Efficient k-Coverage Algorithms for Wireless Sensor Networks Mohamed Hefeeda.
Vladimír Smotlacha CESNET High-speed Programmable Monitoring Adapter.
1 Randomization, Derandomization, and Parallelization --- take the MIS problem as a demonstration Speaker: Hong Chih-Duo Advisor: Chao Kuen-Mao National.
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
Data Structures and Algorithms in Parallel Computing Lecture 2.
Automated Worm Fingerprinting Authors: Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Publish: OSDI'04. Presenter: YanYan Wang.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.
Packet Classification Using Multidimensional Cutting Sumeet Singh (UCSD) Florin Baboescu (UCSD) George Varghese (UCSD) Jia Wang (AT&T Labs-Research) Reviewed.
2009/6/221 BotMiner: Clustering Analysis of Network Traffic for Protocol- and Structure- Independent Botnet Detection Reporter : Fong-Ruei, Li Machine.
Access Link Capacity Monitoring with TFRC Probe Ling-Jyh Chen, Tony Sun, Dan Xu, M. Y. Sanadidi, Mario Gerla Computer Science Department, University of.
SketchVisor: Robust Network Measurement for Software Packet Processing
On-line Detection of Real Time Multimedia Traffic
Pagerank and Betweenness centrality on Big Taxi Trajectory Graph
A Study of Group-Tree Matching in Large Scale Group Communications
A paper on Join Synopses for Approximate Query Answering
CONGESTION CONTROL.
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Network-Wide Routing Oblivious Heavy Hitters
Alan Kuhnle*, Victoria G. Crawford, and My T. Thai
Lu Tang , Qun Huang, Patrick P. C. Lee
PCAV: Evaluation of Parallel Coordinates Attack Visualization
Presentation transcript:

Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College of Computing Georgia Institute of Technology Joint work with A. Kumar(Georgia Tech), L. Li (Bell lab), J. Wang(AT&T – Research) and J. Xu(Georgia Tech) 2nd IEEE International Workshop on Networking Meets Databases

1/17 Monitoring and analyzing aggregate traffic Essential for detecting “global” events –important in network measurement/management We focus on : detecting common content –motivations early stage detection of Internet worms/viruses/spam s flash crowd event detection in web browsing or P2P traffic –challenge : hard to detect by per-link monitoring signal is usually too weak to be detected locally correlation of traffic among many links is required our approach : distributed data streaming

2/17 Problem statement Aligned case (e.g. Web browsing, CodeRed worm) Unaligned case (e.g. worms such as Nimda or Sircam) Assumption : fixed MTU size is used –few popular packet sizes in Internet (> 500) (e.g. 576,1500) –our algorithm can be extended later header 1common substring header 2common substring packetization header 1common substring header 2common substring

3/17 Solution framework Data Collection Module Data Collection Module Packet stream 1. Update Packet stream 1. Update Monitoring station 2 Monitoring station m m x n 3. Compose matrix Data Collection Module Packet stream 1. Update Monitoring station 1 1 x n Data Analysis Module 2. Transmit digest Central Monitoring Station 4. Analyze & sound alarm Feedback to stations Valued customers

4/17 Challenges & Requirements Data Collection Module –fast enough for high-speed links –much smaller size of digests than original traffic –digests should contain enough information Data Analysis Module –low computational complexity –high accuracy in identifying patterns

5/17 –size of bitmap n determined by link speed & measurement epoch Assuming 1000 bits average packet size, 4 Mbits for OC-48 (2.4 Gbps) is enough for 1 sec traffic –hash packets only if enough payload exist (  500bytes) Aligned case : Data Collection Module 1. pkt arrives IP header contents 3. Set a bit 1 2. Get an index hash n

6/17 Aligned case : Data Analysis Module – a x b submatrix : common content of b packets seen by a points –problem of finding the largest all-1 submatrix in general case equivalent to “maximum edge biclique problem” : known to NP-hard –our case is more tractable : each element is drawn by Bernoulli trial our polynomial greedy algorithm utilizes this property a 123b 12345n m m x n bitmap composed after receiving 1 x n digests from m routers red 1s : a x b pattern gray 1s: noises Header 1Common substring Header 2Common substring

7/17 Case 2 : Unaligned case Much more challenging than aligned case –same content can be packetized differently –detecting correlations among bitmaps is more complex assuming 576 MTU size & prefix length ~ uniform[0,535], matching probability between bitmaps is only 1/536. Our approaches –data collection amplify the weakened signal due to the noise (random offsets) design two techniques : offset sampling and flow splitting –data analysis design new statistical data analysis techniques Header 1Common substring Header 2Common substring prefix length

8/17 Unaligned case: Data Collection Module Offset sampling –use k bitmaps : can magnify signal strength to around k 2 –update bitmaps using k random offsets in [0,535] A[1] A[2] A[k] A[3] IP header contents Hash len 1 Offset[1] Offset[2] Offset[k] Hash 1 len Hash len 1 1

9/17 Unaligned case: Data Collection Module flow splitting –split flows into t different groups of arrays –increases correlation between two matching arrays IP header contents flow label Hash group 1group 2group t flow 21,6,3… flow 2,10,9…

10/17 Overview of statistical techniques Phase transition theory in ER random graph –ER graph G(n,p) n : number of vertices p : probability that two vertices are connected by an edge –phase transition theory if p  1/n, all connected components are of size O(log n) if p > 1/n, a giant connected components of  (n) begins to emerge p  1/n p > 1/n

11/17 Unaligned case: Data Analysis Module Converting bitmaps to a graph 1)vertex mapping 2 3 m group 1group 2group t 1 12t 2)edge mapping if matching_test(group1,group2) = positive  add an edge matching_test

12/17 Unaligned case: Data Analysis Module Case 1 - no large common content - output : random graph G(n,p) (by tuning thresholds) m 12t Case 2 - contains popular common content - output : random graph G(n,p) + preferential attachment m 12t

13/17 Overview of statistical techniques Erdös-Renyi test : pattern presence test –procedure 1)pairwise correlation is computed 2)construct a graph with scaling factor p ( < 1/n ) 3)test {the size of largest connected component >> O(log n) } –can provide very accurate binary(yes/no) answer Finding the core : identifying nodes –greedy algorithm to find most of the nodes that saw the common content is proposed

14/17 Computational complexity Major bottleneck : bitmap-to-graph conversion –memory read-only operation : can be highly parallelized use multiple processors use specially designed hardwares –reduce detection accuracy to decrease the complexity can reduce number of monitoring points or groups find pattern in only sampled vertices

15/17 Unaligned case: Evaluation Simulation : monitor 800 links x 128 groups each = 102,400 vertices Erdös-Renyi test –common content is packetized into 100 packets –p = 0.65/10 5 ( < phase transition probability 1/n = 1.024/10 5 ) –varing # of vertices seen the common content 120 vertices seen (15%) 140 vertices seen (17.5%)

16/17 Unaligned case: Evaluation Finding the core Evaluation using tier-1 ISP trace –demonstrates the effectiveness of the proposed algorithms # packets in common content # of vertices seen the common content Average Core Size Average False Negative Average False Positive (50%) 112.1(75%) 154.4(90%) (50%) 59.3(75%) 81.8(90%) (50%) 38.5(75%) 51.9(90%)

17/17 Conclusion designed a distributed algorithms to detect common content in Internet traffic proposed distributed data streaming approach algorithms are shown to be effective through extensive simulations

Thank you !