Online Identification of Hierarchical Heavy Hitters Yin Zhang Joint work with Sumeet SinghSubhabrata Sen Nick DuffieldCarsten Lund.

Slides:



Advertisements
Similar presentations
Intrusion Detection Systems (I) CS 6262 Fall 02. Definitions Intrusion Intrusion A set of actions aimed to compromise the security goals, namely A set.
Advertisements

Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.
Characteristics of Network Traffic Flow Anomalies Paul Barford and David Plonka University of Wisconsin – Madison SIGCOMM IMW, 2001.
1 IP-Lookup and Packet Classification Advanced Algorithms & Data Structures Lecture Theme 08 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
New Directions in Traffic Measurement and Accounting Cristian Estan – UCSD George Varghese - UCSD Reviewed by Michela Becchi Discussion Leaders Andrew.
Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College.
Fast Firewall Implementation for Software and Hardware-based Routers Lili Qiu, Microsoft Research George Varghese, UCSD Subhash Suri, UCSB 9 th International.
Fast Algorithms For Hierarchical Range Histogram Constructions
A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,
Detecting DDoS Attacks on ISP Networks Ashwin Bharambe Carnegie Mellon University Joint work with: Aditya Akella, Mike Reiter and Srinivasan Seshan.
FLAME: A Flow-level Anomaly Modeling Engine
BOAT - Optimistic Decision Tree Construction Gehrke, J. Ganti V., Ramakrishnan R., Loh, W.
1 BGP Anomaly Detection in an ISP Jian Wu (U. Michigan) Z. Morley Mao (U. Michigan) Jennifer Rexford (Princeton) Jia Wang (AT&T Labs)
Nick Duffield, Patrick Haffner, Balachander Krishnamurthy, Haakon Ringberg Rule-Based Anomaly Detection on IP Flows.
1 Communication-Efficient Online Detection of Network-Wide Anomalies Ling Huang* XuanLong Nguyen* Minos Garofalakis § Joe Hellerstein* Michael Jordan*
Measuring Large Traffic Aggregates on Commodity Switches Lavanya Jose, Minlan Yu, Jennifer Rexford Princeton University, NJ 1.
Streaming Algorithms for Robust, Real- Time Detection of DDoS Attacks S. Ganguly, M. Garofalakis, R. Rastogi, K. Sabnani Krishan Sabnani Bell Labs Research.
A Framework for Classifying Denial of Service Attacks Alefiya Hussain, John Heidemann and Christos Papadopoulos presented by Nahur Fonseca NRG, June, 22.
1 Finding a Needle in a Haystack: Pinpointing Significant BGP Routing Changes in an IP Network Jian Wu (University of Michigan) Z. Morley Mao (University.
1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.
Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1, Zhichun Li 1, Yan Chen 1, Yan Gao 1, Ashish.
Multi-Scale Analysis for Network Traffic Prediction and Anomaly Detection Ling Huang Joint work with Anthony Joseph and Nina Taft January, 2005.
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
Cumulative Violation For any window size  t  Communication-Efficient Tracking for Distributed Cumulative Triggers Ling Huang* Minos Garofalakis.
Towards a High-speed Router-based Anomaly/Intrusion Detection System (HRAID) Zhichun Li, Yan Gao, Yan Chen Northwestern.
Measurement and Monitoring Nick Feamster Georgia Tech.
ANOMALY DETECTION AND CHARACTERIZATION: LEARNING AND EXPERIANCE YAN CHEN – MATT MODAFF – AARON BEACH.
Collaborating Against Common Enemies Sachin Katti Balachander Krishnamurthy and Dina Katabi AT&T Labs-Research & MIT CSAIL.
EL 933 Final Project Presentation Combining Filtering and Statistical Methods for Anomaly Detection Augustin Soule Kav´e SalamatianNina Taft.
A DoS Resilient Flow-level Intrusion Detection Approach for High-speed Networks Yan Gao, Zhichun Li, Yan Chen Lab for Internet and Security Technology.
Dream Slides Courtesy of Minlan Yu (USC) 1. Challenges in Flow-based Measurement 2 Controller Configure resources1Fetch statistics2(Re)Configure resources1.
March 1, Packet Classification and Filtering for Network Processors JC Ho.
A Signal Analysis of Network Traffic Anomalies Paul Barford, Jeffrey Kline, David Plonka, and Amos Ron.
1 Towards Anomaly/Intrusion Detection and Mitigation on High-Speed Networks Yan Gao, Zhichun Li, Yan Chen Northwestern Lab for Internet and Security Technology.
A Signal Analysis of Network Traffic Anomalies Paul Barford with Jeffery Kline, David Plonka, Amos Ron University of Wisconsin – Madison Summer, 2002.
George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight.
Attig 1 Automatically Inferring Patterns of Resource Consumption in Network Traffic In Proceedings of SIGCOMM 2003 Reviewed By Michael Attig
Tomo-gravity Yin ZhangMatthew Roughan Nick DuffieldAlbert Greenberg “A Northern NJ Research Lab” ACM.
Coarse-Grained Traffic Analysis in ISP Networks A Router-Based Approach Christian Martin Verizon.
Computer Security: Principles and Practice First Edition by William Stallings and Lawrie Brown Lecture slides by Lawrie Brown Chapter 8 – Denial of Service.
SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.
Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.
Wire Speed Packet Classification Without TCAMs ACM SIGMETRICS 2007 Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison)
Balajee Vamanan and T. N. Vijaykumar School of Electrical & Computer Engineering CoNEXT 2011.
Network Anomography Yin Zhang – University of Texas at Austin Zihui Ge and Albert Greenberg – AT&T Labs Matthew Roughan – University of Adelaide IMC 2005.
1 LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams Qun Huang and Patrick P. C. Lee The Chinese.
CINBAD CERN/HP ProCurve Joint Project on Networking 26 May 2009 Ryszard Erazm Jurga - CERN Milosz Marian Hulboj - CERN.
Mining Anomalies Using Traffic Feature Distributions Anukool Lakhina Mark Crovella Christophe Diot in ACM SIGCOMM 2005 Presented by: Sailesh Kumar.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
Consensus Extraction from Heterogeneous Detectors to Improve Performance over Network Traffic Anomaly Detection Jing Gao 1, Wei Fan 2, Deepak Turaga 2,
ApproxHadoop Bringing Approximations to MapReduce Frameworks
Automated Worm Fingerprinting Authors: Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Publish: OSDI'04. Presenter: YanYan Wang.
@ Carnegie Mellon Databases 1 Finding Frequent Items in Distributed Data Streams Amit Manjhi V. Shkapenyuk, K. Dhamdhere, C. Olston Carnegie Mellon University.
Network Anomography Yin Zhang Joint work with Zihui Ge, Albert Greenberg, Matthew Roughan Internet Measurement.
Dynamic Algorithms with Worst-case Performance for Packet Classification Pankaj Gupta and Nick McKeown Stanford University {pankaj,
Network Anomaly Detection Using Autonomous System Flow Aggregates Thienne Johnson 1,2 and Loukas Lazos 1 1 Department of Electrical and Computer Engineering.
SCREAM: Sketch Resource Allocation for Software-defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat (CoNEXT’15)
SketchVisor: Robust Network Measurement for Software Packet Processing
Constant Time Updates in Hierarchical Heavy Hitters
Jian Wu (University of Michigan)
Data Streaming in Computer Networking
Impact of Packet Sampling on Anomaly Detection Metrics
Transport Layer Systems Packet Classification
SCREAM: Sketch Resource Allocation for Software-defined Measurement
Packet Classification Using Coarse-Grained Tuple Spaces
Constant Time Updates in Hierarchical Heavy Hitters
Lu Tang , Qun Huang, Patrick P. C. Lee
Presentation transcript:

Online Identification of Hierarchical Heavy Hitters Yin Zhang Joint work with Sumeet SinghSubhabrata Sen Nick DuffieldCarsten Lund Internet Measurement Conference 2004

2 Motivation Traffic anomalies are common –DDoS attacks, Flash crowds, worms, failures Traffic anomalies are complicated –Multi-dimensional  may involve multiple header fields E.g. src IP AND port 1214 (KaZaA) Looking at individual fields separately is not enough! –Hierarchical  Evident only at specific granularities E.g /32, /24, /16, /8 Looking at fixed aggregation levels is not enough! Want to identify anomalous traffic aggregates automatically, accurately, in near real time –Offline version considered by Estan et al. [SIGCOMM03]

3 Challenges Immense data volume (esp. during attacks) –Prohibitive to inspect all traffic in detail Multi-dimensional, hierarchical traffic anomalies –Prohibitive to monitor all possible combinations of different aggregation levels on all header fields Sampling (packet level or flow level) –May wash out some details False alarms –Too many alarms = info “snow”  simply get ignored Root cause analysis –What do anomalies really mean?

4 Approach Prefiltering extracts multi- dimensional hierarchical traffic clusters –Fast, scalable, accurate –Allows dynamic drilldown Robust heavy hitter & change detection –Deals with sampling errors, missing values Characterization (ongoing) –Reduce false alarms by correlating multiple metrics –Can pipe to external systems Prefiltering (extract clusters) Identification (robust HH & CD) Characterization Input Output

5 Prefiltering Input – –Bytes (we can also use other metrics) Output –All traffic clusters with volume above (epsilon * total_volume) ( cluster ID, estimated volume ) –Traffic clusters: defined using combinations of IP prefixes, port ranges, and protocol Goals –Single Pass –Efficient (low overhead) –Dynamic drilldown capability

6 Dynamic Drilldown via 1-DTrie At most 1 update per flow Split level when adding new bytes causes bucket >= T split Invariant: traffic trapped at any interior node < T split stage 1 (first octet) stage 2 (second octet) stage 3 (third octet) field stage 4 (last octet)

7 1-D Trie Data Structure Reconstruct interior nodes (aggregates) by summing up the children Reconstruct missed value by summing up traffic trapped at ancestors Amortize the update cost

8 1-D Trie Performance Update cost –1 lookup + 1 update Memory –At most 1/T split internal nodes at each level Accuracy: For any given T > d*T split –Captures all flows with metric >= T –Captures no flow with metric < T-d*T split

9 Extending 1-D Trie to 2-D: Cross-Producting Update(k1, k2, value) In each dimension, find the deepest interior node (prefix): (p1, p2) –Can be done using longest prefix matching (LPM) Update a hash table using key (p1, p2): –Hash table: cross product of 1-D interior nodes Reconstruction can be done at the end p1 = f(k1) p2 = f(k2) totalBytes{ p1, p2 } += value

10 Cross-Producting Performance Update cost: –2 X (1-D update cost) + 1 hash table update. Memory –Hash table size bounded by (d/T split ) 2 –In practice, generally much smaller Accuracy: For any given T > d*T split –Captures all flows with metric >= T –Captures no flow with metric < T- d*T split

11 HHH vs. Packet Classification Similarity –Associate a rule for each node  finding fringe nodes becomes PC Difference –PC:rules given a priori and mostly static –HHH:rules generated on the fly via dynamic drilldown Adapted 2 more PC algorithms to 2-D HHH –Grid-of-tries & Rectangle Search –Only require O(d/T split ) memory Decompose 5-D HHH into 2-D HHH problems

12 Change Detection Data –5 minute reconstructed cluster series –Can use different interval size Approach –Classic time series analysis Big change –Significant departure from forecast

13 Change Detection: Details Holt-Winters –Smooth + Trend + (Seasonal) Smooth:Long term curve Trend:Short term trend (variation) Seasonal:Daily / Weekly / Monthly effects –Can plug in your favorite method Joint analysis on upper & lower bounds –Can deal with missing clusters ADT provides upper bounds (lower bound = 0) –Can deal with sampling variance Translate sampling variance into bounds

14 Evaluation Methodology Dataset description –Netflow from a tier-1 ISP Algorithms tested TraceDuration#routers#recordsVolume ISP-100K3 min1100 K66.5 MB ISP-1day1 day2332 M223.5 GB ISP-1mon1 month27.5 G5.2 TB Baseline (Brute-force) Sketchsk, sk2 Lossy Countinglc Our algorithms Cross-Productingcp Grid-of-triesgot Rectangle Searchrs

15 Runtime Costs ISP-100K (gran = 1)ISP-100K (gran = 8) We are an order of magnitude faster

16 Normalized Space ISP-100K (gran = 1)ISP-100K (gran = 8) normalized space = actual space / [ (1/epsilon) (32/gran) 2 ] gran = 1: we need less / comparable space gran = 8: we need more space but total space is small

17 HHH Accuracy False Negatives (ISP-100K)False Positives (ISP-100K) HHH detection accuracy comparable to brute-force

18 Change Detection Accuracy ISP-1day, router 2 Top N change overlap is above 97% even for very large N

19 Effects of Sampling ISP-1day, router 2 Accuracy above 90% with 90% data reduction

20 Some Detected Changes

21 Next Steps Characterization –Aim: distinguish events using sufficiently rich set of metrics E.g. DoS attacks looks different from flash crowd (bytes/flow smaller in attack) –Metrics: # flows, # bytes, # packets, # SYN packets – ↑ SYN, ↔ bytes  DoS?? – ↑ packets, ↔ bytes  DoS?? – ↑ packets, in multiple /8  DoS? Worm? Distributed detection –Our summary data structures can easily support aggregating data collected from multiple locations

22 Thank you!

23 HHH Accuracy Across Time ISP-1mon (router 1)ISP-1mon (router 2)