Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1, Zhichun Li 1, Yan Chen 1, Yan Gao 1, Ashish.

Slides:



Advertisements
Similar presentations
Sketch-based Change Detection Balachander Krishnamurthy (AT&T) Subhabrata Sen (AT&T) Yin Zhang (AT&T) Yan Chen (UCB/AT&T) ACM Internet Measurement Conference.
Advertisements

New Directions in Traffic Measurement and Accounting Cristian Estan – UCSD George Varghese - UCSD Reviewed by Michela Becchi Discussion Leaders Andrew.
Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College.
1 An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup Hot Interconnects 2008 Socrates Demetriades, Michel Hanna, Sangyeun Cho and Rami.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
OpenSketch Slides courtesy of Minlan Yu 1. Management = Measurement + Control Traffic engineering – Identify large traffic aggregates, traffic changes.
A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,
Streaming Algorithms for Robust, Real- Time Detection of DDoS Attacks S. Ganguly, M. Garofalakis, R. Rastogi, K. Sabnani Krishan Sabnani Bell Labs Research.
1 Yan Chen Northwestern Lab for Internet and Security Technology (LIST) Dept. of Computer Science Northwestern University
1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.
Polytechnic University,ECE Department1 Detection of “Hot Spots” Paper Title : Joint Data Streaming and Sampling Techniques for Detection of Super Sources.
Northwestern Lab for Internet and Security Technology (LIST) Yan Chen Router-based Anomaly/Intrusion Detection and Mitigation (RAIDM) Systems Scalable.
RAIDM: Router-based Anomaly/Intrusion Detection and Mitigation Zhichun Li EECS Deparment Northwestern University Thesis Proposal.
High-Performance Network Anomaly/Intrusion Detection & Mitigation System (HPNAIDM) Zhichun Li Lab for Internet & Security Technology (LIST) Department.
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
Towards a High-speed Router-based Anomaly/Intrusion Detection System (HRAID) Zhichun Li, Yan Gao, Yan Chen Northwestern.
1 Load Balance and Efficient Hierarchical Data-Centric Storage in Sensor Networks Yao Zhao, List Lab, Northwestern Univ Yan Chen, List Lab, Northwestern.
Detecting Attacks in Routers Using Sketches Dhiman Barman Piyush Satapathy Gianfranco Ciardo.
A DoS Resilient Flow-level Intrusion Detection Approach for High-speed Networks Yan Gao, Zhichun Li, Yan Chen Lab for Internet and Security Technology.
1 Towards Anomaly/Intrusion Detection and Mitigation on High-Speed Networks Yan Gao, Zhichun Li, Manan Sanghi, Yan Chen, Ming- Yang Kao Northwestern Lab.
1 Network Intrusion Detection and Mitigation Yan Chen Northwestern Lab for Internet and Security Technology (LIST) Department of Computer Science Northwestern.
1 Towards Anomaly/Intrusion Detection and Mitigation on High-Speed Networks Yan Gao, Zhichun Li, Yan Chen Northwestern Lab for Internet and Security Technology.
Towards a High speed Router based Anomaly/Intrusion detection System Yan Gao & Zhichun Li.
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
1 Network-based Intrusion Detection, Mitigation and Forensics System Yan Chen Department of Electrical Engineering and Computer Science Northwestern University.
George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight.
Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.
Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.
1 HPNAIDM: the High-Performance Network Anomaly/Intrusion Detection and Mitigation System Yan Chen Lab for Internet & Security Technology (LIST) Department.
SCAN: a Scalable, Adaptive, Secure and Network-aware Content Distribution Network Yan Chen CS Department Northwestern University.
Fast and deterministic hash table lookup using discriminative bloom filters  Author: Kun Huang, Gaogang Xie,  Publisher: 2013 ELSEVIER Journal of Network.
SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.
Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon Joint work with Iddo Hanniel and Isaac Keslassy Technion, Israel 1.
Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable Attack Resilience Zhichun Li, Manan Sanghi, Yan Chen, Ming-Yang Kao and Brian.
Vladimír Smotlacha CESNET Full Packet Monitoring Sensors: Hardware and Software Challenges.
Wire Speed Packet Classification Without TCAMs ACM SIGMETRICS 2007 Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison)
Network Anomography Yin Zhang – University of Texas at Austin Zihui Ge and Albert Greenberg – AT&T Labs Matthew Roughan – University of Adelaide IMC 2005.
1 LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams Qun Huang and Patrick P. C. Lee The Chinese.
A Dos Resilient Flow-level Intrusion Detection Approach for High-speed Networks Yan Gao, Zhichun Li, Yan Chen Department of EECS, Northwestern University.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Measurement COS 597E: Software Defined Networking.
Anomaly/Intrusion Detection and Prevention in Challenging Network Environments 1 Yan Chen Department of Electrical Engineering and Computer Science Northwestern.
Online Identification of Hierarchical Heavy Hitters Yin Zhang Joint work with Sumeet SinghSubhabrata Sen Nick DuffieldCarsten Lund.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
Anomaly/Intrusion Detection and Prevention in Challenging Network Environments 1 Yan Chen Department of Electrical Engineering and Computer Science Northwestern.
Towards High Performance Network Defense Zhichun Li EECS Department Northwestern University.
Towards High Speed Network Defense Zhichun Li EECS Deparment Northwestern University.
Network-based and Attack-resilient Length Signature Generation for Zero-day Polymorphic Worms Zhichun Li 1, Lanjia Wang 2, Yan Chen 1 and Judy Fu 3 1 Lab.
Monitoring, Diagnosing, and Securing the Internet 1 Yan Chen Department of Electrical Engineering and Computer Science Northwestern University Lab for.
SCREAM: Sketch Resource Allocation for Software-defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat (CoNEXT’15)
Northwestern Lab for Internet & Security Technology (LIST)
Yan Chen Northwestern Lab for Internet and Security Technology (LIST) Dept. of Computer Science Northwestern University
Exploiting Graphics Processors for High-performance IP Lookup in Software Routers Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu IEEE INFOCOM.
SketchVisor: Robust Network Measurement for Software Packet Processing
Jennifer Rexford Princeton University
Network-based Intrusion Detection, Prevention and Forensics System
Northwestern Lab for Internet and Security Technology (LIST) Yan Chen Department of Computer Science Northwestern University.
Query-Friendly Compression of Graph Streams
Pyramid Sketch: a Sketch Framework
Yan Chen Department of Electrical Engineering and Computer Science
SCREAM: Sketch Resource Allocation for Software-defined Measurement
Yan Chen Lab for Internet & Security Technology (LIST)
End-user Based Network Measurement and Diagnosis
Packet Classification Using Coarse-Grained Tuple Spaces
A Small and Fast IP Forwarding Table Using Hashing
Northwestern Lab for Internet and Security Technology (LIST)
Network-Wide Routing Oblivious Heavy Hitters
A flow aware packet sampling mechanism for high speed links
Lu Tang , Qun Huang, Patrick P. C. Lee
Presentation transcript:

Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1, Zhichun Li 1, Yan Chen 1, Yan Gao 1, Ashish Gupta 1, Yin Zhang 2, Peter Dinda 1, Ming-Yang Kao 1, Gokhan Memik 1 1 Lab for Internet and Security Technology (LIST), Northwestern Univ. 2 University of Texas at Austin

The Spread of Sapphire/Slammer Worms

Motivation (online change detection) Online network anomaly/intrusion detection over high speed links –Small memory usage –Small # of memory access per packet –Scalable to large key space size Primitives for online anomaly detection –Heavy hitters (lots of prior work) –Heavy changes: enabler for aggregate queries over multiple data streams Asymmetric routing demands spatial aggregation Time Series Analysis (TSA) need temporal aggregation

Outline Background on k-ary sketch Reversible sketch problem Modular hashing IP mangling Reverse hashing Evaluation Conclusion

[Krishnamurthy, Sen, Zhang, Chen, 2003] First to detect flow-level heavy changes in massive data streams at network traffic speeds K-ary sketch 1 j H 01K-1 … … …

k-ary sketch 1 j H 01K-1 … … … hj(k)hj(k) hH(k)hH(k) h1(k)h1(k) Update (k, u): T j [ h j (k)] += u (for all j) Estimate v(S, k): sum of updates for key k [Krishnamurthy, Sen, Zhang, Chen, 2003] APIs: + =  S=COMBINE( ,S1, ,S2):

? ? Main problem –Cannot efficiently report keys with heavy change INFERENCE(S,t) –Important function for anomaly detection! Our Contribution –Determine set of keys that have “large” estimates in a sketch Reverse Sketch Problem

Reversible sketch framework Streaming data recording reversible k-ary sketch value stored value Modular hashing IP mangling key Heavy change detection reversible k-ary sketch Reverse hashing Reverse IP mangling heavy change keys change threshold

Outline Background on k-ary sketch Reversible sketch problem Modular hashing IP mangling Reverse hashing Evaluation Conclusion

Intersect A 1, A 2, A 3, A 4, A 5 Taking Intersections H = 5 K = 2 12 #keys = 2 32 (IP addresses) E[false positives] << 1

The problem with simple intersection Each set A i can be very large ! H = 5 K = 2 12 #keys = 2 32 (IP addresses) |A 1 | = 2 32 / 2 12 = 2 20

The problem with simple intersection Each set A i can be very large ! Solution: Modular hashing

Modular hashing reduces the set size 32 bits 8 bits h() 12 bits

Modular hashing reduces the set size 32 bits 8 bits h 1 ()h 2 ()h 3 ()h 4 () Greatly reduces size of reverse mapped sets

Modular hashing reduces the set size b1b1 b2b2 b4b4 b5b5 b3b3 A 1 : 2 5 * 2 5 * 2 5 * 2 5 Intersection: Only 32 elements per word set

b1b1 b2b2 b4b4 b5b5 b3b3 A 1 : 2 5 * 2 5 * 2 5 * 2 5 A 2: 2 5 * 2 5 * 2 5 * 2 5 Intersection: Modular hashing reduces the set size

Problem: Too many collisions * 32 bits 12 bits

Problem: Too many collisions * 32 bits 12 bits IP Mangling with GF (Galois Extension Field) Solution: IP Mangling: a bijective mapping function for breaking the key space continuity

Outline Background on k-ary sketch Reversible sketch problem Modular hashing IP mangling Reverse hashing Evaluation Conclusion

Handling Multiple Intersections… b1b1 b2b2 b4b4 b5b5 b3b3 b3b3 b1b1 b2b2 b4b4 b5b5 2 H different intersections Much more difficult – Solution: Reverse Hashing algorithms Step 1: Reverse hashing for each module Step 2: Infer the whole key through bucket index matching among candidates from each module

Reverse Hashing for Each Module H=5, r=1, K=2 12 r tolerance level candidate set of the first word in Hash table i All possible values of the first word in the sketch Take the first word as an example { 2,3,5} { 2, 6,9,10} {0,2,3} { 2,3,8,10} { 3,6,7,9} {2}{2,3}

Bucket Index Matrix of Candidates H=5, r=1, K=2 12 For each x in I 1, we can get B 1 (x), a vector of the heavy bucket sets which x hashes to b 11 b 21 b 42 b 51 b 32 b 31 b 12 b 22 b 41 b b 11 b 21 b 42 b 51 b 32 b 31 b 12 b 22 b 41 b b 11 b 21 b 42 b 51 b 32 b 31 b 12 b 22 b 41 b *.*.* hash to the red heavy buckets

Prefix Extension Algorithm I1I1 I2I2 B1B1 B2B = * more than r=1 Ignore! Ignore! Path discovery algorithm

+= I3I3 B3B3 + = 75 I4I4 B4B4 Prefix Extension Algorithm

Recap: Streaming data recording reversible k-ary sketch value stored value Modular hashing IP mangling key Heavy change detection reversible k-ary sketch Reverse hashing Reverse IP mangling heavy change keys change threshold n is the size of key space

Outline Background on k-ary sketch Reversible sketch problem Modular hashing IP mangling Reverse hashing Evaluation Conclusion

Evaluation Dataset –A large US ISP (330M Netflow records) –NU (19M Netflow records) Efficient data recording For the worst case traffic, all 40-byte packets –Software: 526Mbps on P4 3.2Ghz PC –Hardware: 16Gbps on a single FPGA broad –Only a few hundred KB to a couple of MB memory used –Only 15 memory access per packet for 48 bit reversible sketches and 16 per packet for 64 bit reversible sketches Efficient heavy change detection and key inference –0.34 seconds for 100 changes seconds for 1000 change

Key Inference Accuracy True positives and false positives of 16bit reversible sketches for 32bit IP addresses [Deltoids]: S.Muthukrishnan and Graham Cormode, What's New: Find Significant Differences in Network Data Streams. Infocom 2004

Stress test with larger dataset still accurate Scalable to larger key space size: similar results for 64bit IP pairs Built anomaly/intrusion detection system to detect, e.g., SYN flooding and port scans [ICDCS 2006] More Results

Conclusions Proposed the first reversible sketches which Record high speed network streams online Detect the heavy changes and infer the keys online Small memory usage, small # of memory access per packet Scalable to large key space size

Backup Slides

Related work Compare with [deltoids] –Accuracy better –Scalable to large key space better –# of Memory access less [PCF, IMC2004]: not reversible [Q. Zhao et al, IMC2005] [S.Venkataraman, NDSS2005]: unique fan-out (fan-in) estimation.

Modular Hashing Optimal Hashing

However… Not reversible Lack of an inference API: INFERENCE(S,t) Important function for anomaly detection! Decouple the recording stage of sketches from the detection stage to enable efficient combine and inference. Given a threshold t, report keys whose corresponding sum of updates are larger than the threshold. Our contribution: an efficient algorithm for inference Reversible sketch problem

? ?

Problem: Too many collisions * 32 bits 12 bits IP Mangling with Solution:

IP-mangling Use GF (Galois Extension Field) function for attack resilience

Modular Hashing Modular Hashing with IP Mangling Optimal Hashing

Reverse Hashing for Each Module b 11 b 21 b 42 b 51 b 32 b 31 b 12 b 22 b 41 b 52 H=5, r=1, K=2 12 all possible value of the first word for the No. j heavy bucket in Hash table i all possible value of the first word in Hash table i All possible value of the first word in the sketch Take the first word as an example

False positive reduction by original sketch verifying Estimate (, 180) Threshold 150 (, 180) Final result Verified original k -ary sketch

[Krishnamurthy, Sen, Zhang, Chen, 2003] K-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003] first to detect flow-level heavy changes in massive data streams at network traffic speeds APIs –UPDATE(S,k,u): T j [ h j (k)] += u (for all j) –ESTIMATE(S, k): sum of updates for key k –Linear combination: S=COMBINE( ,S 1, ,S 2 ) + = 