Payload Attribution via Hierarchical Bloom Filters

Slides:



Advertisements
Similar presentations
A CGA based Source Address Authentication Method in IPv6 Access Network(CSA) Guang Yao, Jun Bi and Pingping Lin Tsinghua University APAN26 Queenstown,
Advertisements

IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College.
1 Fast Routing Table Lookup Based on Deterministic Multi- hashing Zhuo Huang, David Lin, Jih-Kwon Peir, Shigang Chen, S. M. Iftekharul Alam Department.
ForNet: A Distributed Forensic Network Kulesh Shanmugasundaram
11 Packet Sampling for Worm and Botnet Detection in TCP Connections Reporter: 林佳宜 /10/25.
Hash-Based IP Traceback Best Student Paper ACM SIGCOMM’01.
IP Spoofing CIS 610 Week 2: 13-JAN Definition and Background n Def’n: The forging of the IP Source Address field in an IP packet n First mentioned.
Hit or Miss ? !!!.  Cache RAM is high-speed memory (usually SRAM).  The Cache stores frequently requested data.  If the CPU needs data, it will check.
Bloom Filters Kira Radinsky Slides based on material from:
Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department.
Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1, Zhichun Li 1, Yan Chen 1, Yan Gao 1, Ashish.
Secure communications Week 10 – Lecture 2. To summarise yesterday Security is a system issue Technology and security specialists are part of the system.
Security in Wireless LAN Layla Pezeshkmehr CS 265 Fall 2003-SJSU Dr.Mark Stamp.
Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines By F. Bonomi et al. Presented by Kenny Cheng, Tonny Mak Yui Kuen.
Mitigating Bandwidth- Exhaustion Attacks using Congestion Puzzles XiaoFeng Wang Michael K. Reiter.
بسم الله الرحمن الرحيم NETWORK SECURITY Done By: Saad Al-Shahrani Saeed Al-Smazarkah May 2006.
Improved TCAM-based Pre-Filtering for Network Intrusion Detection Systems Department of Computer Science and Information Engineering National Cheng Kung.
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
Secure routing for structured peer-to-peer overlay networks (by Castro et al.) Shariq Rizvi CS 294-4: Peer-to-Peer Systems.
ECE 526 – Network Processing Systems Design Network Security: string matching algorithm Chapter 17: George Varghese.
Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese, and Stefan Savage Manan Sanghi.
Hash-Based IP Traceback Alex C. Snoeren, Craig Partidge, Luis A. Sanchez, Christine E. Jones, Fabrice Tchakountio, Stephen T. Kent, and W. Timothy Strayer.
1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
Review of IP traceback Ming-Hour Yang The Department of Information & Computer Engineering Chung Yuan Christian University
Bloom Filters An Introduction and Really Most Of It CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook.
Fast and deterministic hash table lookup using discriminative bloom filters  Author: Kun Huang, Gaogang Xie,  Publisher: 2013 ELSEVIER Journal of Network.
Traceback Pat Burke Yanos Saravanos. Agenda Introduction Problem Definition Traceback Methods  Packet Marking  Hash-based Conclusion References.
Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.
Sujayyendhiren RS, Kaiqi Xiong and Minseok Kwon Rochester Institute of Technology Motivation Experimental Setup in ProtoGENI Conclusions and Future Work.
Approximate Encoding for Direct Access and Query Processing over Compressed Bitmaps Tan Apaydin – The Ohio State University Guadalupe Canahuate – The Ohio.
Implementing a Port Knocking System in C Honors Thesis Defense by Matt Doyle.
Data Compression By, Keerthi Gundapaneni. Introduction Data Compression is an very effective means to save storage space and network bandwidth. A large.
Design of a System for Real- Time Worm Detection Bharath Madhusudan, John Lockwood Department of Computer Science and Engineering Washington University,
Firewall Technologies Prepared by: Dalia Al Dabbagh Manar Abd Al- Rhman University of Palestine
A Dynamic Packet Stamping Methodology for DDoS Defense Project Presentation by Maitreya Natu, Kireeti Valicherla, Namratha Hundigopal CISC 859 University.
Large-Scale IP Traceback in High-Speed Internet : Practical Techniques and Theoretical Foundation Jun (Jim) Xu Networking & Telecommunications Group College.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Measurement COS 597E: Software Defined Networking.
1 A Throughput-Efficient Packet Classifier with n Bloom filters Authors: Heeyeol Yu and Rabi Mahapatra Publisher: IEEE GLOBECOM 2008 proceedings Present:
Trajectory Sampling for Direct Traffic Oberservation N.G. Duffield and Matthias Grossglauser IEEE/ACM Transactions on Networking, Vol. 9, No. 3 June 2001.
Efficient Peer-to-Peer Keyword Searching 1 Efficient Peer-to-Peer Keyword Searching Patrick Reynolds and Amin Vahdat presented by Volker Kudelko.
Geo Location Service CS218 Fall 2008 Yinzhe Yu, et al : Enhancing Location Service Scalability With HIGH-GRADE Yinzhe Yu, et al : Enhancing Location Service.
Department of Computer Science and Engineering Applied Research Laboratory Architecture for a Hardware Based, TCP/IP Content Scanning System David V. Schuehler.
BARD / April BARD: Bayesian-Assisted Resource Discovery Fred Stann (USC/ISI) Joint Work With John Heidemann (USC/ISI) April 9, 2004.
Hash-Based IP Traceback Alex C. Snoeren †, Craig Partridge, Luis A. Sanchez, Christine E. Jones, Fabrice Tchakountio, Stephen T. Kent, W. Timothy Strayer.
EC week Review. Rules of Engagement Teams selected by instructor Host will read the entire questions. Only after, a team may “buzz” by raise of.
PGP & IP Security  Pretty Good Privacy – PGP Pretty Good Privacy  IP Security. IP Security.
Intrusion Detection Systems Paper written detailing importance of audit data in detecting misuse + user behavior 1984-SRI int’l develop method of.
BY: CHRIS GROVES Privacy in the Voting Booth. Reason for Privacy Voters worry that their vote may be held against them in the future  People shouldn’t.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:
Automated Worm Fingerprinting Authors: Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Publish: OSDI'04. Presenter: YanYan Wang.
Hyperion :High Volume Stream Archival Divya Muthukumaran.
Wired Equivalent Privacy (WEP) Chris Overcash. Contents What is WEP? What is WEP? How is it implemented? How is it implemented? Why is it insecure? Why.
Hash-Based IP Traceback Alex C. Snoeren †, Craig Partridge, Luis A. Sanchez, Christine E. Jones, Fabrice Tchakountio, Stephen T. Kent, W. Timothy Strayer.
Hash-Based IP Traceback Alex C. Snoeren +, Craig Partridge, Luis A. Sanchez ++, Christine E. Jones, Fabrice Tchakountio, Stephen T. Kent and W. Timothy.
K. Salah1 Security Protocols in the Internet IPSec.
Jessica Kornblum DSL Seminar Nov. 2, 2001 Hash-Based IP Traceback Alex C. Snoeren +, Craig Partridge, Luis A. Sanchez ++, Christine E. Jones, Fabrice Tchakountio,
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
25/09/ Firewall, IDS & IPS basics. Summary Firewalls Intrusion detection system Intrusion prevention system.
Searchable Encryption in Cloud
Introduction Wireless devices offering IP connectivity
Single-Packet IP Traceback
Cryptographic Hash Functions Part I
Preventing Internet Denial-of-Service with Capabilities
Hash Functions for Network Applications (II)
A flow aware packet sampling mechanism for high speed links
Presentation transcript:

Payload Attribution via Hierarchical Bloom Filters Kulesh Shanmugasundaram, Hervé Brönnimann Nasir Memon http://isis.poly.edu/projects/fornet/ Joint work with Herve and Nasir

suppose we get an email like this, what can we do about it? How can we answer these questions

Have Problem. Will Solve. Hash-packets: Storage is better Need packets for attribution! We just have emails! Packet-loggers: Need ~2TB/day IS is not going to like this Bloom-Filters: Storage is better than hashes SPIE uses it for single packet traceback We just have emails! To tackle the problem we have some options We can use packet-loggers but they will consume lot of storage. In our network it will be about 2TB/day We can hash the packets instead but simply hashing the whole packet requires the entire packet later when we need to query. But, we only have emails (or application view) Bloom filters are used successfully in SPIE to traeback single packets. Again, the scheme necessitates the whole packet.

Our Solution: Create digests of payload s.t we can: So the problem is how to we identify the sources/destinations of a bit-string in a network? This talk is about our solution which creates digests of payload s.t it can attribute excerpts of payload and reduce storage requirements at the same time. The Problem: Identify the sources/ destinations of a bit-string in a network Our Solution: Create digests of payload s.t we can: Attribute excerpts of payload Reduce storage requirements

Bloom Filters FP = (1- (1 - 1/m)kn)k Bloom Filter: Randomized data for representing a set in order to support membership queries. Insert(x): Flip bits H1(x)… Hk(x) to ‘1’ IsMember(y): If H1(Y) … Hk(Y) all ‘1’ “yes” otherwise “no” Can tradeoff memory (m), compute power (k), and accuracy (FP) m – length of bit vector (range of H(.)) k – number of hashes per element n – number of elements in the set H2(x) H1(x) Hk(x) H3(x) 1 m-bit vector The core of our digesting scheme is a bloom filter. Basically, a bloom filter is a randomized ds tha represents a set of elements and can answer membership queries with high-probability. -- few words about insertion and testing -- few words on trade-offs FP = (1- (1 - 1/m)kn)k

Packet Digests & Bloom Filters Snoeren et. al. used it successfully in SPIE for single packet traceback (“Hash-Based IP Traceback”) Space Efficient: 16-bits per packet (m/n=16) and 8 hashes (k=8) false positive (FP) = 5.74 x 10-4 No false negatives! However, we don’t have packets. We only have some excerpt of payload Don’t know where the excerpt was aligned in the packet Extend Bloom Filters to support excerpt/substring matching But we cannot digest the whole packet if we were to support queries on excerpts.

Block-based Bloom Filter P = s p0 p1 p2 … p(n/s) P = create s-byte blocks of payload P = (p0|0) (p1|s) (p2|2s) … (p(n/s)|n/s) append blocks’ Offset (in payload) Explain this quickly. Insert each block into a Bloom Filter

X Q = s q0 q1 q2 … q(l/s) Q = BBF = … (q0|0) (q0|s) (q1|2s) (q2|3s) create s-byte blocks of query string Try all possible offsets (q0|0) (q0|s) q0=p1 (q1|2s) q1=p2 (q2|3s) q2=p3 H1(q0|0)=1 H2(q0|0)=1 H3(q0|0)=0 Run thru this quickly. X “q0q1q2” was seen in a payload at offset ‘s’ BBF = (p0|0) (p1|s) (p2|2s) (p2|3s) … (p(n/s)|n/s)

A B R C D P1 = P2 = BBF = “Offset Collisions” (A|0) (B|s) (R|2s) (A|3s) (C|4s) (A|5s) (C|0) (D|s) (A|2s) (B|3s) (R|4s) “Offset Collisions” (A|0) (B|s) (R|2s) (A|3s) (C|4s) (A|5s) (C|0) (D|s) (A|2s) (B|3s) (R|4s) “offset collisions” are bad. Makes the system easy to circumvent and false positives! For query strings: “AD”, “CB”, “DR”, “AA” etc. BBF falsely identifies them as seen in the payload! Because BBF cannot distinguish between P1 and P2

Hierarchical Bloom Filter An HBF is basically a set of BBF for geometrically increasing sizes of blocks. (p0|0) (p1|s) (p2|2s) … (p(n/s)|n/s) level-0 P = (p0p1|0) (p2p3|2s) (p(n/s-1)p(n/s)|(n/s-1)) level-1 (p0p1p2p3|0) level-2 Hierarchical Bloom Filter We can eliminate “offset collisions” by inserting the blocks in a hierarchy. This allows us to check if in fact two blocks occurred together. Querying is similar to BBF at each level. Querying is similar to BBF. Matches at each level can be confirmed a level above.

Hierarchical Bloom Filter Now we have a data structure that: Allows us to do substring matching Avoids “offset collisions” Improves accuracy over standard Bloom Filter and BBF FPeHBF << FPeBBF << FP HBF Performance Summary: Using a Bloom Filter with m/n=5, k=2, FP=0.1090 Block size of 128-bytes Achieves about 100 fold savings in storage For a 512-byte query FPHBF = 2.75x10-4 In summary, we have a … (pretty much say what’s on the slide) Highlight the performance of HBF. (128-byte block is a sweet spot for tracing emails and worms because usually there are at least 1KB) (I will put the equations for FPs as soon as I figure out how best to present them!)

A Payload Attribution System (PAS) The HBF presented so far allow us to do substring matching but does not tie a source/destination with the matched excerpt. But we can easily tie in source/destination into HBF, just like we tied the offsets of blocks. Now we insert, (block||offset) and (block||offset||(source,dest)) (Point it in the figure) Source/destination lists are readily available in networks (firewalls, logs, netflow) provide this info. But our prototype maintains one anyway. System design of a payload attribution system Packet id or host identifier is (SourceIP||DestIP) Although host identifier can be obtained from firewalls and routers (Netflow), a list of host ids is maintained by the system

Tracking MyDoom Recorded all email traffic for a week Using HBF and raw traffic Was not aware of MyDoom during this collection When signatures became available we used them to query the system To find hosts that are infected in our network How the hosts were infected

MyDoom’s Weekly Progress…

MyDoom’s Daily Progress

Attacks on PAS Resource Exhaustion: Malicious Transformation: Flood the network with random bits of data Malicious Transformation: Create packets of length (blocksize – 1) Stuffing: Stuff every other block with application dependent escape characters For smaller blocks we can try to guess for larger blocks it is not possible! Exploiting Collisions: Hash collisions Very unlikely for strong hash functions We use a random seed for every HBF so it makes it more difficult Packet collisions A possibility in Block-based Bloom Filters but not in HBFs Streaming transformations: Encryption, compression

Conclusion & Future Work Summary: A data structure for digesting payloads Supports queries on excerpts Reduces storage requirements Provides some privacy guarantees Payload Attribution System: Capable of attributing excerpts of payload to source/destination In alpha-tests in our campus network Future work: Value-based blocking using Rabin fingerprints Enhancing storage of host ids Hardware implementations