New Streaming Algorithms for Fast Detection of Superspreaders Shobha Venkataraman* Joint work with: Dawn Song*, Phillip Gibbons ¶,

Slides:

Advertisements

Similar presentations

Attacking Cryptographic Schemes Based on Perturbation Polynomials Martin Albrecht (Royal Holloway), Craig Gentry (IBM), Shai Halevi (IBM), Jonathan Katz.

Advertisements

New Directions in Traffic Measurement and Accounting Cristian Estan (joint work with George Varghese)

New Directions in Traffic Measurement and Accounting Cristian Estan – UCSD George Varghese - UCSD Reviewed by Michela Becchi Discussion Leaders Andrew.

3/13/2012Data Streams: Lecture 161 CS 410/510 Data Streams Lecture 16: Data-Stream Sampling: Basic Techniques and Results Kristin Tufte, David Maier.

Jaringan Komputer Lanjut Packet Switching Network.

ABSTRACT We consider the problem of computing information theoretic functions such as entropy on a data stream, using sublinear space. Our first result.

A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,

Efficient Constraint Monitoring Using Adaptive Thresholds Srinivas Kashyap, IBM T. J. Watson Research Center Jeyashankar Ramamirtham, Netcore Solutions.

Detecting DDoS Attacks on ISP Networks Ashwin Bharambe Carnegie Mellon University Joint work with: Aditya Akella, Mike Reiter and Srinivasan Seshan.

Fast, Memory-Efficient Traffic Estimation by Coincidence Counting Fang Hao 1, Murali Kodialam 1, T. V. Lakshman 1, Hui Zhang 2, 1 Bell Labs, Lucent Technologies.

FLAME: A Flow-level Anomaly Modeling Engine

MULTOPS A data-structure for bandwidth attack detection Thomer M. Gil Vrije Universiteit, Amsterdam, Netherlands MIT, Cambridge, MA, USA

Worm Origin Identification Using Random Moonwalks Yinglian Xie, V. Sekar, D. A. Maltz, M. K. Reiter, Hui Zhang 2005 IEEE Symposium on Security and Privacy.

Using Auxiliary Sensors for Pair-Wise Key Establishment in WSN Source: Lecture Notes in Computer Science (2010) Authors: Qi Dong and Donggang Liu Presenter:

Heavy hitter computation over data stream

Models and Security Requirements for IDS. Overview The system and attack model Security requirements for IDS –Sensitivity –Detection Analysis methodology.

Streaming Algorithms for Robust, Real- Time Detection of DDoS Attacks S. Ganguly, M. Garofalakis, R. Rastogi, K. Sabnani Krishan Sabnani Bell Labs Research.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006

1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

Polytechnic University,ECE Department1 Detection of “Hot Spots” Paper Title : Joint Data Streaming and Sampling Techniques for Detection of Super Sources.

Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:

Communication-Efficient Distributed Monitoring of Thresholded Counts Ram Keralapura, UC-Davis Graham Cormode, Bell Labs Jai Ramamirtham, Bell Labs.

What ’ s Hot and What ’ s Not: Tracking Most Frequent Items Dynamically G. Cormode and S. Muthukrishman Rutgers University ACM Principles of Database Systems.

1 TVA: A DoS-limiting Network Architecture Xiaowei Yang (UC Irvine) David Wetherall (Univ. of Washington) Thomas Anderson (Univ. of Washington)

Detection of Interactive Stepping Stones Shobha Venkataraman Joint work with Avrim Blum & Dawn Song Carnegie Mellon University ICML Workshop.

Estimating Set Expression Cardinalities over Data Streams Sumit Ganguly Minos Garofalakis Rajeev Rastogi Internet Management Research Department Bell Labs,

CS591A1 Fall Sketch based Summarization of Data Streams Manish R. Sharma and Weichao Ma.

On the Difficulty of Scalably Detecting Network Attacks Kirill Levchenko with Ramamohan Paturi and George Varghese.

Internet Quarantine: Requirements for Containing Self-Propagating Code David Moore et. al. University of California, San Diego.

BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,

Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.

1 Network-based Intrusion Detection, Mitigation and Forensics System Yan Chen Department of Electrical Engineering and Computer Science Northwestern University.

George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight.

Tracking Port Scanners on the IP Backbone Tao Ye Sprint Burlingame, CA Avinash Sridharan University of Southern California.

Layered Approach using Conditional Random Fields For Intrusion Detection.

SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.

Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.

DoWitcher: Effective Worm Detection and Containment in the Internet Core S. Ranjan et. al in INFOCOM 2007 Presented by: Sailesh Kumar.

FiG: Automatic Fingerprint Generation Shobha Venkataraman Joint work with Juan Caballero, Pongsin Poosankam, Min Gyung Kang, Dawn Song & Avrim Blum Carnegie.

1 Flow Identification Assume you want to guarantee some type of quality of service (minimum bandwidth, maximum end-to-end delay) to a user Before you do.

Vigilante: End-to-End Containment of Internet Worms Authors : M. Costa, J. Crowcroft, M. Castro, A. Rowstron, L. Zhou, L. Zhang, and P. Barham In Proceedings.

A Dynamic Packet Stamping Methodology for DDoS Defense Project Presentation by Maitreya Natu, Kireeti Valicherla, Namratha Hundigopal CISC 859 University.

1 Limits of Learning-based Signature Generation with Adversaries Shobha Venkataraman, Carnegie Mellon University Avrim Blum, Carnegie Mellon University.

A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

1 LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams Qun Huang and Patrick P. C. Lee The Chinese.

An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.

Computer Science CSC 774 Adv. Net. Security1 Presenter: Tong Zhou 11/21/2015 Practical Broadcast Authentication in Sensor Networks.

The Bloom Paradox Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, Israel.

Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University

Enabling a “RISC” Approach for Software-Defined Monitoring using Universal Streaming Vyas Sekar Zaoxing Liu, Greg Vorsanger, Vladimir Braverman.

The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

Exploiting Network Structure for Proactive Spam Mitigation Shobha Venkataraman * Joint work with Subhabrata Sen §, Oliver Spatscheck §, Patrick Haffner.

Polygraph: Automatically Generating Signatures for Polymorphic Worms Presented by: Devendra Salvi Paper by : James Newsome, Brad Karp, Dawn Song.

Automated Worm Fingerprinting Authors: Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Publish: OSDI'04. Presenter: YanYan Wang.

Hyperion :High Volume Stream Archival Divya Muthukumaran.

1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.

Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.

SketchVisor: Robust Network Measurement for Software Packet Processing

Constant Time Updates in Hierarchical Heavy Hitters

Distributed Network Traffic Feature Extraction for a Real-time IDS

Data Streaming in Computer Networking

The Variable-Increment Counting Bloom Filter

Worm Origin Identification Using Random Moonwalks

Xiaoyang Zhang1, Yuchong Hu1, Patrick P. C. Lee2, Pan Zhou1

Range-Efficient Computation of F0 over Massive Data Streams

Memento: Making Sliding Windows Efficient for Heavy Hitters

Constant Time Updates in Hierarchical Heavy Hitters

Heavy Hitters in Streams and Sliding Windows

Transport Layer Identification of P2P Traffic

Lu Tang , Qun Huang, Patrick P. C. Lee

Presentation transcript:

New Streaming Algorithms for Fast Detection of Superspreaders Shobha Venkataraman* Joint work with: Dawn Song*, Phillip Gibbons ¶, Avrim Blum* *Carnegie Mellon University, ¶ Intel Research Pittsburgh

2 Superspreaders k-superspreader: host that contacts at least k distinct destinations in short time pe riod. Goal: given stream of packets, find k- superspreaders Why care about superspreaders? Indicators of possible network attacks E.g., compromised host in worm propagation contacts many distinct destinations Slammer worm contacted upto 26,000 hosts per second! Automatic identification useful in logging and throttling attack traffic

3 Heavy Distinct-Hitters General problem: given a stream of (x,y) pairs, find all x paired with at least k distinct y: heavy distinct-hitter problem. Applications: Find dests contacted by many distinct srcs Find ports contacted by many distinct srcs/dests, or with high ICMP traffic Find potential spammers without per-src information Find nodes that contact many other nodes in peer-to- peer networks

4 Challenges Need very efficient algorithms for high-speed links Superspreaders often tiny fraction of network traffic: e.g., in traces, < 0.004% of total traffic Need algorithms in streaming model: Allow only one pass over data Much less storage than data Distributed monitoring desirable, must have little communication between monitors

5 Strawman Approaches Approach 1: track every src with list of distinct destinations contacted, e.g. Snort Too much storage! Approach 2: track every src with a distinct counter per src. [Estan et al 03] Also too much storage! Approach 3: Use multiple-cache data structure of Weaver et al 04. Designed for different problem, does not scale for finding superspreaders

6 Outline Introduction Problem Definition Algorithms Extensions Experiments Conclusions

7 Formal Problem Definition Given k, b > 1, and probability of failure , any k-superspreader output with probability at least 1 -  any src that contacts < k/b distinct dests output with probability <  srcs in between may or may not be output. Thus, expect to identify src as superspreader after it contacts more than k/b and fewer than k distinct dests

8 Example Example: k = 1000, b = 2,  = Then, Pr[src output | contacts ≥ 1000 dests] > 0.95 Pr[src output | contacts < 500 dests] < 0.05 Expect gap between normal behaviour and superspreaders. No. of distinct destinations contacted d 3 = 500 d 2 = 750 d 1 = 1000 s1s1 s2s2 s3s3

9 Theoretical Guarantees Given k, b > 1, and , can set parameters so that, for N distinct flows: Pr[k-superspreader output] > 1 -  Pr[false positive output] <  Expected memory (fixed b): O(N/k log 1/  ) Note: as many as N/k k-superspreaders possible, so within O(log 1/  ) of lower bound Per-packet processing time: constant At most 2 hashes and 2 memory accesses per packet Most packets get 1 hash, or 1 hash and 1 memory access

10 Outline Introduction Problem Definition Algorithms One-Level Filtering Algorithm Two-Level Filtering Algorithm Extensions Experiments Conclusions

11 One-Level Filtering Algorithm (s, d) Step 2: If h(s, d) > c, discard packet Step 3: If h(s, d) < c, insert into hash table s1s1 s2s2 smsm d 1,1 d 1,2 d 1,z d 2,1 d 2,2 d 2,z’ d m,1 d m,2 d m,z” Step 1: Compute h(s, d) Step 4: Report all srcs with more than r destinations in hash table (We’re effectively sampling distinct flows at rate c.) packet

12 Example: One-Level Filtering Example: k = 1000, b = 2,  = Compute that c = 0.052, r = 39 In expectation: 94.8% packets require one computation Remaining 5.2% require more processing & storage

13 Two-Level Filtering: Intuition (I) One-level filtering stores many small-dest srcs Need threshold sampling rate to distinguish between srcs contact k and k/b dests Expected distribution: most srcs contact few dests. But, all srcs sampled at threshold rate. Use two-level filtering to reduce memory usage on such traffic distributions Coarse rate: decide whether to sample at fine rate Fine rate: distinguish between srcs sending to k and k/b dests

14 Two-Level Filtering: Intuition (II) Example: k = 1000, b = 2 Suppose coarse rate is 1/100 Expect that a 1000-superspreader will show up once in first 100 dest; w.h.p. in, say, first 200 dest Use the remaining 800 dest to distinguish from a source that sends to only 500 dest w.h.p. Only store 1% of the sources that send to few dests Similar worst-case guarantees, but significantly better under some natural distributions

15 Two-Level Filtering Algorithm s 1,1 s 1,2 s 1,z s 2,1 s 2,2 s 2,z’ s m,1 s m,2 s m,z” F1F1 F2F2 FmFm s’ 1,1 s’ 1,2 s’ 1,w C (s, d) Compute h 1 (s, d) Sample: if h 1 (s, d) < r 1 and s is present in C Compute k = r 1 /m Insert s into hash-table F k Compute h 2 (s, d) Sample: if h 2 (s, d) < r 2 store s in C Return all the sources that appear in at least r of the hash-table F i packet Step 1Step 2

16 Example: Two-Level Filtering Example: k = 1000, b = 2,  = Compute r 1 = 0.15, r 2 = 0.006, m = 100 Case 1: srcs that contact 1 distinct dest each 85% of flows discarded 0.6% entered into coarse filter 15% examined if present in coarse filter Case 2: srcs that are superspreaders 85% of flows discarded per superspreader 15% of flows require entry into fine filter

17 Outline Introduction Problem Definition Algorithms Extensions Experiments Conclusions

18 Extension: Deletions in Stream Goal: superspreaders when deletions allowed in stream Application: find srcs with many distinct connection failures Connection initiated: (src, dst) pair appears in stream Response received: that (src, dst) pair gets deleted (s 1,d 1,1), (s 1,d 2,1), (s 1,d 3,1), (s 2,d 2,1), (s 1,d 4,1), (s 2,d 2,-1)... (s 1,d 1,1), (s 1,d 2,1), (s 1,d 3,1), (s 2,d 2,1), (s 1,d 4,1), (s 2,d 2,-1), (s 1,d 2,- 1)...

19 Extension: Sliding Windows Goal: Find superspreaders over sliding windows of packets e.g. in only most recent t packets, or last 1 hour. … (s 1,d 1 ), (s 1,d 2 ), (s 1,d 3 ), (s 2,d 2 ), (s 2,d 4 )...… (s 1,d 1 ), (s 1,d 2 ), (s 1,d 3 ), (s 2,d 2 ), (s 2,d 4 ), (s 1,d 5 )... … (s 1,d 1 ), (s 1,d 2 ), (s 1,d 3 ), (s 2,d 2 ), (s 2,d 4 ), (s 1,d 5 ), (s 3,d 4 )...

20 Given: set of monitoring points, each point sees a stream of packets Goal: Find superspreaders in union of streams One-level filtering algorithm needs very little communication Extension: Distributed Monitoring (s 1,d 1 ), (s 1,d 2 ), (s 2,d 3 ), (s 1,d 1 )... (s 1,d 1 ), (s 1,d 3 ), (s 2,d 4 ), (s 2,d 5 )... (s 1,d 1 ), (s 2,d 2 ), (s 3,d 3 ), (s 4,d 4 )... A B C

21 Outline Introduction Problem Definition Algorithms Extensions Experiments Conclusions

22 Experimental Setup Experiments run on Pentium IV, 1.8 GHz with 1GB RAM Traces taken from NLANR archive, ranging from 2.8 million packets (65 sec) to 4.5 million packets (4.5 min) Added 100 srcs that contact k distinct dests and 100 srcs that contact k/b distinct dests Use randomly generated SHA1 hash function for each run For all experiments,  = 0.05

23 Experimental Results (I) Accuracy Discussion: Both algorithms have desired accuracy False positive rate much less 0.05, since most (eligible) srcs send to many fewer than k/b dests Observed false positives only come from srcs close to the boundary

24 Experimental Results (II) 1LF = 1-Level Filtering 2LF-T = 2-Level Filtering hash-table implementation 2LF-B = 2-Level Filtering Bloom-filter implementation As expected, when b increases, sampling rates decrease, and total memory usage decreases 2LF-B has least memory usage k = 200, b = 2 k = 200, b = 5k = 200, b = 10

25 Experimental Results (III) 1LF = 1-Level Filtering 2LF-T = 2-Level Filtering hash-table implementation 2LF-B = 2-Level Filtering Bloom-filter implementation As expected, when k increases, sampling rates decrease, and total memory usage decreases 2LF-B has least memory usage k = 500, b = 2 k = 1000, b = 2k = 5000, b = 2

26 Related Work Networking: Related problems: finding heavy-hitters [Estan- Varghese 02], multidimensional traffic clusters [Estan+ 03], distribution of flow lengths [Duffield+ 03], large changes in network traffic [Cormode- Muthukrishnan 03] Streaming Algorithms: Most closely related: counting number of distinct values in a stream [Flajolet-Martin 85, Alon-Matias-Szegedy 99, Cohen 97, Gibbons-Tirthapura 02, Bar-Yossef+ 02, Cormode+ 02]

27 Summary Defined superspreader (and heavy distinct-hitter) problem One-pass streaming algorithms: Theoretical guarantees on accuracy and overhead Experimental analysis validates theoretical results Extensions to model with deletions, sliding windows and distributed monitoring Novel two-level filtering scheme may be of independent interest

28 Thank you!

29 Motivation (II) Superspreaders different from heavy-hitters! Care about many distinct destinations Few large file transfers => heavy-hitter, but not superspreader Superspreaders not necessarily heavy-hitters In test traces, superspreaders < 0.004% total traffic analyzed

30 Theoretical Guarantees Given k, b > 1, and , can set parameters for both algorithms so that: Pr[k-superspreader output] > 1 -  Pr[false positive output] <  Expected memory (fixed b): O(N/k log 1/  ) Per-packet processing time: constant At most 2 hashes and 2 memory accesses per packet Most packets get one hash, or 1 hash + 1 memory access Optimization: implement Two-Level Filtering with Bloom filters – decreases memory usage, increases computational cost.