A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,

Slides:

Advertisements

Similar presentations

Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.

Advertisements

New Directions in Traffic Measurement and Accounting Cristian Estan – UCSD George Varghese - UCSD Reviewed by Michela Becchi Discussion Leaders Andrew.

Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College.

OpenSketch Slides courtesy of Minlan Yu 1. Management = Measurement + Control Trafﬁc engineering – Identify large traffic aggregates, traffic changes.

Estimating TCP Latency Approximately with Passive Measurements Sriharsha Gangam, Jaideep Chandrashekar, Ítalo Cunha, Jim Kurose.

Efficient Constraint Monitoring Using Adaptive Thresholds Srinivas Kashyap, IBM T. J. Watson Research Center Jeyashankar Ramamirtham, Netcore Solutions.

Detecting DDoS Attacks on ISP Networks Ashwin Bharambe Carnegie Mellon University Joint work with: Aditya Akella, Mike Reiter and Srinivasan Seshan.

Fast, Memory-Efficient Traffic Estimation by Coincidence Counting Fang Hao 1, Murali Kodialam 1, T. V. Lakshman 1, Hui Zhang 2, 1 Bell Labs, Lucent Technologies.

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.

Enabling Flow-level Latency Measurements across Routers in Data Centers Parmjeet Singh, Myungjin Lee Sagar Kumar, Ramana Rao Kompella.

Distributed Algorithms for Secure Multipath Routing

Streaming Algorithms for Robust, Real- Time Detection of DDoS Attacks S. Ganguly, M. Garofalakis, R. Rastogi, K. Sabnani Krishan Sabnani Bell Labs Research.

1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

Polytechnic University,ECE Department1 Detection of “Hot Spots” Paper Title : Joint Data Streaming and Sampling Techniques for Detection of Super Sources.

Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:

Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1, Zhichun Li 1, Yan Chen 1, Yan Gao 1, Ashish.

Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines By F. Bonomi et al. Presented by Kenny Cheng, Tonny Mak Yui Kuen.

Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.

Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.

Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.

Cumulative Violation For any window size  t  Communication-Efficient Tracking for Distributed Cumulative Triggers Ling Huang* Minos Garofalakis.

Towards a High-speed Router-based Anomaly/Intrusion Detection System (HRAID) Zhichun Li, Yan Gao, Yan Chen Northwestern.

Detecting Attacks in Routers Using Sketches Dhiman Barman Piyush Satapathy Gianfranco Ciardo.

EL 933 Final Project Presentation Combining Filtering and Statistical Methods for Anomaly Detection Augustin Soule Kav´e SalamatianNina Taft.

Dream Slides Courtesy of Minlan Yu (USC) 1. Challenges in Flow-based Measurement 2 Controller Configure resources1Fetch statistics2(Re)Configure resources1.

Crossroads: A Practical Data Sketching Solution for Mining Intersection of Streams Jun Xu, Zhenglin Yu (Georgia Tech) Jia Wang, Zihui Ge, He Yan (AT&T.

Fast and Robust Worm Detection Algorithm Tian Bu Aiyou Chen Scott Vander Wiel Thomas Woo bearhsu.

1 Towards Anomaly/Intrusion Detection and Mitigation on High-Speed Networks Yan Gao, Zhichun Li, Yan Chen Northwestern Lab for Internet and Security Technology.

Coordinated Sampling sans Origin-Destination Identifiers: Algorithms and Analysis Vyas Sekar, Anupam Gupta, Michael K. Reiter, Hui Zhang Carnegie Mellon.

Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.

UCSC 1 Aman ShaikhICNP 2003 An Efficient Algorithm for OSPF Subnet Aggregation ICNP 2003 Aman Shaikh Dongmei Wang, Guangzhi Li, Jennifer Yates, Charles.

1 Network-based Intrusion Detection, Mitigation and Forensics System Yan Chen Department of Electrical Engineering and Computer Science Northwestern University.

George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight.

Intrusion and Anomaly Detection in Network Traffic Streams: Checking and Machine Learning Approaches ONR MURI area: High Confidence Real-Time Misuse and.

Lucent Technologies – Proprietary Use pursuant to company instruction Learning Sequential Models for Detecting Anomalous Protocol Usage (work in progress)

Tracking Port Scanners on the IP Backbone Tao Ye Sprint Burlingame, CA Avinash Sridharan University of Southern California.

Anomaly Detection Studies in the IP Backbone Tao Ye Sprint Burlingame, CA

SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.

Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.

New Streaming Algorithms for Fast Detection of Superspreaders Shobha Venkataraman* Joint work with: Dawn Song*, Phillip Gibbons ¶,

© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Minimizing Rulesets for TCAM Implementation.

CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon (Technion, Israel) Joint work with Iddo Hanniel and Isaac Keslassy ( Technion ) 1.

CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon Joint work with Iddo Hanniel and Isaac Keslassy Technion, Israel 1.

Multiple Aggregations Over Data Streams Rui ZhangNational Univ. of Singapore Nick KoudasUniv. of Toronto Beng Chin OoiNational Univ. of Singapore Divesh.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Large-Scale IP Traceback in High-Speed Internet : Practical Techniques and Theoretical Foundation Jun (Jim) Xu Networking & Telecommunications Group College.

Resource/Accuracy Tradeoffs in Software-Defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan HotSDN’13.

1 LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams Qun Huang and Patrick P. C. Lee The Chinese.

Data Stream Algorithms Ke Yi Hong Kong University of Science and Technology.

Jennifer Rexford Princeton University MW 11:00am-12:20pm Measurement COS 597E: Software Defined Networking.

Online Identification of Hierarchical Heavy Hitters Yin Zhang Joint work with Sumeet SinghSubhabrata Sen Nick DuffieldCarsten Lund.

Distributed Denial-of-Service Attack Detection (and Mitigation?) Mukesh Agarwal, Aditya Akella, Ashwin Bharambe.

D 陳怡安 R 解巽評 R 高榮泰 IEEE/ACM TRANSACTIONS ON NETWORKING OCTOBER 2006 Cristian Estan, George Varghese, Member, IEEE, and Michael Fisk.

By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation.

SCREAM: Sketch Resource Allocation for Software-defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat (CoNEXT’15)

Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.

SketchVisor: Robust Network Measurement for Software Packet Processing

Jennifer Rexford Princeton University

A Resource-minimalist Flow Size Histogram Estimator

Query-Friendly Compression of Graph Streams

SCREAM: Sketch Resource Allocation for Software-defined Measurement

Memento: Making Sliding Windows Efficient for Heavy Hitters

Heavy Hitters in Streams and Sliding Windows

By: Ran Ben Basat, Technion, Israel

A flow aware packet sampling mechanism for high speed links

Lu Tang , Qun Huang, Patrick P. C. Lee

Toward Self-Driving Networks

Toward Self-Driving Networks

(Learned) Frequency Estimation Algorithms

Presentation transcript:

A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs, Alcatel-Lucent 1 Columbia University 2 May 10, 2007

Outline Motivation Why heavy-key detection? What are the challenges? Sequential hashing scheme Allows fast, memory-efficient heavy-key detection in high-speed networks Results of trace-driven simulation

Motivation Many anomalies in today’s networks: Worms, DoS attacks, flash crowds, … Input: a stream of packets in (key, value) pairs Key: e.g., srcIPs, flows,… Value: e.g., data volume Goal: identify heavy keys that cause anomalies Heavy hitters: keys with massive data in one period E.g., flows that violate service agreements Heavy changers: keys with massive data change across two periods E.g., sources that start DoS attacks

Challenge Keeping track of per-key values is infeasible Counter value Key … N Number of keys = 2 32 if we keep track of source IPs Number of keys = if we keep track of 5-tuples (srcIP, dstIP, srcPort, dstPort, proto) v1v1 v2v2 v3v3 vNvN

Goal Find heavy keys using a “smart” design: Fast per-packet update Fast identification of heavy keys Memory-efficient High accuracy

Previous Work Multi-stage filter [Estan & Varghese, 03] Covers only heavy hitter detection, but not heavy changer detection Deltoids [Cormode & Muthukrishnan, 04] Covers both heavy hitter and heavy changer detections, but is not memory-efficient in general Reversible sketch [Schweller et al., 06] Space and time complexities of detection are sub-linear in the key space size

Our Contributions Derive the minimum memory requirement subject to a targeted error rate Propose a sequential hashing scheme that is memory- efficient and allows fast detection Propose an accurate estimation method to estimate the values of heavy keys Show via trace-driven simulation that our scheme is more accurate than the existing work

Use a hash array [Estan & Varghese, 2003] M independent hash tables K buckets in each table Table 1 2 … M 1 2 K : bucket Hash array Minimum Memory Requirement How to feasibly keep track of per-key values?

1 2 K : bucket For each packet of key x, Find bucket in Table i by hashing x: h i (x) Increment the counter of each hash bucket by value v Packet Key x value v +v Record step Minimum Memory Requirement h1h1 h2h2 hMhM Table 1 2 … M

Find heavy buckets, whose values (changes) > threshold Heavy keys: associated buckets are heavy buckets 1 2 K : bucket Heavy bucket Detection step Minimum Memory Requirement Table 1 2 … M

Input parameters: N = size of the key space H = max. number of heavy keys  = error rate, Pr(a non-heavy key is treated as a heavy key) Objective: Find all heavy keys subject to a targeted error rate . Minimum memory requirement: Size of a hash array, given by M*K, is minimized when K = H / ln(2) M = log 2 (N / (  H)) Minimum Memory Requirement

How to identify heavy keys? Table 1 2 … M 1 2 K : bucket Challenge: hash array is irreversible Many-to-one mapping Solution: Enumerate all keys!! Computationally expensive Heavy bucket

Sequential Hashing Scheme Basic idea: smaller keys first, then larger keys Observation: if there are H heavy keys, then there are at most H unique sub-keys with respect to the heavy keys Find all possible sub-keys of the H heavy keys Enumeration of a sub-key space is easier Sub-IP space Size = : 255 Entire IP space Size = : : Heavy key

1 2 K : … Array 1Array 2Array D Sequential Hashing Scheme - Record step bucket +v Key x Input: (key x, value v) w1w1 w2w2 wDwD w3w3 … Table1... M … M M D

… Sequential Hashing Scheme - Detection step 1 2 K : Array 1 Array 2 Array D (1 +  )H w 1 ’s Try all w 1 ’s (1 +  )H w 1 w 2 ’s (1 +  )H w 1 w 2 w 3 ’s Try all w 2 ’s (1 +  )H w 1 w 2 …w D ’s … Try all w 3 ’sTry all w D ’s  - intermediate error rate  - targeted error rate Array 3 Heavy bucket

Estimation Goal: find the values of heavy keys Rank the importance of heavy keys Eliminate more non-heavy keys Use maximum likelihood Bucket values due to non-heavy keys ~ Weibull Estimation is solved by linear programming

Recap Data stream Record step Hash arrays 1 2 K : Array 1 Array D … Record step Hash arrays Detection step Estimation Threshold Candidate heavy keys Heavy keys + values Detection step

Experiments Traces: Abilene data collected at an OC-192 link 1 hour long, ~50 GB traffic Evaluation approach: Compare our scheme and Deltoids [Cormode & Muthukrishnan, 04], both of which use the same number of counters Metrics: False positive rate (# of non-heavy keys treated as heavy) / (# of returned keys) False negative rate (# of heavy keys missed) / (true # of heavy keys)

Results - Heavy Hitter Detection Worst-case error rates: Sequential hashing: 1.2% false +ve and 0.8% false -ve Deltoids: 10.5% false +ve, 80% false –ve False +ve/-ve rates of sequential hashing

Results - Heavy Changer Detection False +ve/-ve rates of sequential hashing Worst-case error rates: Sequential hashing: 1.8% false +ve, 2.9% false -ve Deltoids: 1.2% false +ve, 70% false –ve

Summary of Results High accuracy of heavy-key detection while using a memory-efficient data structure Fast detection On the order of seconds Accurate estimation Provides more accurate estimates than least-square regression [Lee et al., 05]

Conclusions Derived the minimum memory requirement for heavy- key detection Proposed the sequential hashing scheme Using a memory-efficient data structure Allowing fast detection Providing small false positives/negatives Proposed an accurate estimation method to reconstruct the values of heavy keys

Thank you

How to Determine H? H = maximum number of heavy keys Total data volume threshold H ≈

Tradeoff Between Memory and Computation  – intermediate error rate Large  : fewer tables, more computation Small  : more tables, less computation