Polytechnic University,ECE Department1 Detection of “Hot Spots” Paper Title : Joint Data Streaming and Sampling Techniques for Detection of Super Sources.

Slides:



Advertisements
Similar presentations
Network Security Highlights Nick Feamster Georgia Tech.
Advertisements

Detecting Spam Zombies by Monitoring Outgoing Messages Zhenhai Duan Department of Computer Science Florida State University.
A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.
New Directions in Traffic Measurement and Accounting Cristian Estan – UCSD George Varghese - UCSD Reviewed by Michela Becchi Discussion Leaders Andrew.
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College.
OpenSketch Slides courtesy of Minlan Yu 1. Management = Measurement + Control Traffic engineering – Identify large traffic aggregates, traffic changes.
A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,
11 Packet Sampling for Worm and Botnet Detection in TCP Connections Reporter: 林佳宜 /10/25.
Hash-Based IP Traceback Best Student Paper ACM SIGCOMM’01.
SIGMOD 2006University of Alberta1 Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Presented by Fan Deng Joint work with.
Sampling and Flow Measurement Eric Purpus 5/18/04.
Streaming Algorithms for Robust, Real- Time Detection of DDoS Attacks S. Ganguly, M. Garofalakis, R. Rastogi, K. Sabnani Krishan Sabnani Bell Labs Research.
1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.
1 Abstract This paper presents a novel modification to the classical Competitive Learning (CL) by adding a dynamic branching mechanism to neural networks.
Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1, Zhichun Li 1, Yan Chen 1, Yan Gao 1, Ashish.
Ph.D. SeminarUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
Low Delay Marking for TCP in Wireless Ad Hoc Networks Choong-Soo Lee, Mingzhe Li Emmanuel Agu, Mark Claypool, Robert Kinicki Worcester Polytechnic Institute.
CSCI 4550/8556 Computer Networks Comer, Chapter 19: Binding Protocol Addresses (ARP)
Performance Evaluation of IPv6 Packet Classification with Caching Author: Kai-Yuan Ho, Yaw-Chung Chen Publisher: ChinaCom 2008 Presenter: Chen-Yu Chaug.
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
Towards a High-speed Router-based Anomaly/Intrusion Detection System (HRAID) Zhichun Li, Yan Gao, Yan Chen Northwestern.
1 A Fast IP Lookup Scheme for Longest-Matching Prefix Authors: Lih-Chyau Wuu, Shou-Yu Pin Reporter: Chen-Nien Tsai.
Automated Worm Fingerprinting Sumeet Singh, Cristian Estan, George Varghese, and Stefan Savage Manan Sanghi.
CS591A1 Fall Sketch based Summarization of Data Streams Manish R. Sharma and Weichao Ma.
Hash-Based IP Traceback Alex C. Snoeren, Craig Partidge, Luis A. Sanchez, Christine E. Jones, Fabrice Tchakountio, Stephen T. Kent, and W. Timothy Strayer.
1 BRICK: A Novel Exact Active Statistics Counter Architecture Nan Hua 1, Bill Lin 2, Jun (Jim) Xu 1, Haiquan (Chuck) Zhao 1 1 Georgia Institute of Technology.
George Varghese (based on Cristi Estan’s work) University of California, San Diego May 2011 Internet traffic measurement: from packets to insight.
Review of IP traceback Ming-Hour Yang The Department of Information & Computer Engineering Chung Yuan Christian University
SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.
New Streaming Algorithms for Fast Detection of Superspreaders Shobha Venkataraman* Joint work with: Dawn Song*, Phillip Gibbons ¶,
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon (Technion, Israel) Joint work with Iddo Hanniel and Isaac Keslassy ( Technion ) 1.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon Joint work with Iddo Hanniel and Isaac Keslassy Technion, Israel 1.
DoWitcher: Effective Worm Detection and Containment in the Internet Core S. Ranjan et. al in INFOCOM 2007 Presented by: Sailesh Kumar.
Authors: Haiquan (Chuck) Zhao, Hao Wang, Bill Lin, Jun (Jim) Xu Conf. : The 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems.
Large-Scale IP Traceback in High-Speed Internet : Practical Techniques and Theoretical Foundation Jun (Jim) Xu Networking & Telecommunications Group College.
Mapping Internet Sensors with Probe Response Attacks Authors: John Bethencourt, Jason Franklin, Mary Vernon Published At: Usenix Security Symposium, 2005.
Optimal XOR Hashing for a Linearly Distributed Address Lookup in Computer Networks Christopher Martinez, Wei-Ming Lin, Parimal Patel The University of.
Author : Guangdeng Liao, Heeyeol Yu, Laxmi Bhuyan Publisher : Publisher : DAC'10 Presenter : Jo-Ning Yu Date : 2010/10/06.
1 LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams Qun Huang and Patrick P. C. Lee The Chinese.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Measurement COS 597E: Software Defined Networking.
The Bloom Paradox Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, Israel.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
D 陳怡安 R 解巽評 R 高榮泰 IEEE/ACM TRANSACTIONS ON NETWORKING OCTOBER 2006 Cristian Estan, George Varghese, Member, IEEE, and Michael Fisk.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /3/2013 Lecture 9: Memory Unit Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE CENTRAL.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
Streaming Algorithms for Robust, Real-Time Detection of DDoS Attacks S. Ganguly M. Garofalakis R. Rastogi K.Sabnani Indian Inst. Of Tech. India Yahoo!
Automated Worm Fingerprinting Authors: Sumeet Singh, Cristian Estan, George Varghese and Stefan Savage Publish: OSDI'04. Presenter: YanYan Wang.
Mapping Internet Sensor With Probe Response Attacks Authors: John Bethencourt, Jason Franklin, and Mary Vernon. University of Wisconsin, Madison. Usenix.
1 ECE 526 – Network Processing Systems Design System Implementation Principles I Varghese Chapter 3.
Packet Classification Using Multidimensional Cutting Sumeet Singh (UCSD) Florin Baboescu (UCSD) George Varghese (UCSD) Jia Wang (AT&T Labs-Research) Reviewed.
IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.
1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.
SCREAM: Sketch Resource Allocation for Software-defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat (CoNEXT’15)
SketchVisor: Robust Network Measurement for Software Packet Processing
Data Streaming in Computer Networking
The Variable-Increment Counting Bloom Filter
Streaming & sampling.
Computer Architecture & Operations I
Pyramid Sketch: a Sketch Framework
Optimal Elephant Flow Detection Presented by: Gil Einziger,
SCREAM: Sketch Resource Allocation for Software-defined Measurement
Mapping Internet Sensors With Probe Response Attacks
Heavy Hitters in Streams and Sliding Windows
Data Pre-processing Lecture Notes for Chapter 2
A flow aware packet sampling mechanism for high speed links
Lu Tang , Qun Huang, Patrick P. C. Lee
Presentation transcript:

Polytechnic University,ECE Department1 Detection of “Hot Spots” Paper Title : Joint Data Streaming and Sampling Techniques for Detection of Super Sources and Destinations Liang,Chao

Polytechnic University,ECE Department2 Motivation  “Hot spots” in the Internet –Super Source (large fan-out) Infected hosts by worm (Slammer worm) –Super Destination (large fan-in) DDoS victim  Internet attacks increasing in severity –Network security monitoring  Challenges High packets arrival rate Speed requirement of RAM (DRAM vs SRAM) Impractical per-flow state maintenance

Polytechnic University,ECE Department3 How to find the needle in the haystack  IP Flow –Abstraction: set of packets identified with same address, ports, etc. –Flow label: Source-destination pair  General Problem: Heavy distinct-hitters –Given a stream of flow label pairs, find all the src that are paired with a large number of distinct destination. –Detect super destination: Reverse the flow label flow 1flow 2flow 3

Polytechnic University,ECE Department4 Weapons  Previous Techniques –Flow state maintenance –Probabilistic counting –Bloom Filters –Multi-resolute bitmap –……  This paper Sampling Network Data streaming

Polytechnic University,ECE Department5 Paper  Qi Zhao, Abhishek Kumar, Jun Xu, “Joint Data Streaming and Sampling Techniques for Detection of Super Sources and Destinations”, IMC 2005

Polytechnic University,ECE Department6 Outline of the rest of the talk  Introduction of one previous work –Traditional hash-based flow sampling  Main approach –Simple scheme –Advanced scheme  Evaluation  Summary

Polytechnic University,ECE Department7 Traditional hash-based flow sampling  Flow sampling –Sample flows with a certain percentage p Hash function maps flow label to a value uniformly distributed in [0,1) H (flow label)<p, then sample the flow  Hash Table –HT1.Detect and discard duplicate ones Access the element with index by hashing flow label Element: list of flow label pairs –HT2.Count flow numbers Access the element with index by hashing srcIP Element: list of pairs

Polytechnic University,ECE Department8 Traditional hash-based flow sampling  Fan-out Calculation –Threshold Judge to report the super source –Estimation to compensate sampling Ē =E*(1/p)  Performance Analysis –Key Ineffective Reason - Low sampling rate The update cost of hash tables (In DRAM) Elephant flows influence –Performance bottleneck: Query of the first hash table –Result The sampling rate p<< Hs / Tr – Hs : operating speed of hash table – Tr : arrival rate of traffic Estimation error scale by 1/p p is too slow!

Polytechnic University,ECE Department9 Contribution of this paper  Network Data Streaming –Process each and every incoming packet in real-time –Employ a small and fast memory –Maintain only the most pertinent information  Two schemes –Simple scheme : filtering after sampling –Advanced scheme : separation of counting and identity gathering Include more information

Polytechnic University,ECE Department10 Simple Scheme System  Filtering after sampling System  Data Streaming module –Replace the hash table –Final goal: improve the sampling rate

Polytechnic University,ECE Department11 Simple Scheme – Data Streaming Module  How to realize –Employing bit array to label new flow Bit array G: w bits Hash function: maps to a value uniformly distributed in [1,w] –Employ SRAM (static random access memory) packet H( ) 012 i 0 w-1 1 flow label

Polytechnic University,ECE Department12 Simple Scheme - Estimation  Hash collision in data streaming –Different flows have same index of G –Miss the update of the hash table  Compensation of the collision –when the ith new flow arrival Variable u: to keep track of the number of “0” in G Variable i : hash result of the new flow P(G[i]=0) = u/w –Compensate the hash collisions by adding w/u  Unbiased Estimation of count –Hash table updated by K flows

Polytechnic University,ECE Department13 Simple Scheme - Algorithm Compensation Calculation

Polytechnic University,ECE Department14 Simple Scheme - Analysis  Unbiased estimator of fan-out  Saturation Avoidance Number of ‘0’ element Probability to be recorded –Minimum of ‘0’ element typically set around w/2 (half full) –Two sets of arrays and hash tables operated alternatively  Sampling rate improved –Affordable SRAM Little memory consumption to support high speed links –Streaming speed Poisson alike update times of the hash table Efficient hardware implementation of hash function All operations in data streaming module can be finished in about 10ns Bottleneck!

Polytechnic University,ECE Department15 Advanced Scheme - System Record source identity (e.g.. source IP) Record flow information to array in real-time Use the source identity(2) to look up the array(1) to estimate offline

Polytechnic University,ECE Department16 Advanced Scheme – Streaming algorithm  2D bit array A(m,n)  Four hash functions –One to get row number (range [1,m]) –Three to get column number (range [1,n]) this case k=3

Polytechnic University,ECE Department17 Advanced Scheme – Streaming module Row collision Column collision Why k=3?

Polytechnic University,ECE Department18 The Linear-Time probabilistic counting algorithm  Idea from Database field: counting the number of unique values in the presence of duplicates  Estimation of distinct flow number –m : column size –n : total number of flow –Aj : the jth element of column –Un: the number of element whose value is “0” j

Polytechnic University,ECE Department19 Joint relation calculation  The distinct values in the join of two relations –AB=A+B-AUB –A->G1 B->G2  Estimate them by linear counting D based on G –AB=D(G1)+D(G2)- D(G1UG2) Note: Cannot directly calculate G1G2 cause different space AпB G1пG2 AUB G1UG2

Polytechnic University,ECE Department20 Advanced Scheme – Estimation module  Computing the join selectivity in three columns(k=3) –U: Bitwise-OR  Avoid two sources both hashed to the same k columns –S: total number distinct sources –n: column number –The probability of collision drop to –When n=16,000, S=100,000, k=3

Polytechnic University,ECE Department21 Advanced Scheme – Identity module  Purpose –Capture the identities of potential super sources –Write data into DRAM in real-time  Identity collection –Estimate the corresponding fan-out as input data  Why DRAM? –Replace expensive hash table operation –Sequential writes can be very fast 100% and 25% recording for OC-192 and OC-768

Polytechnic University,ECE Department22 Evaluation  Real internet traffic traces –UNC(1 Gbps),USC,NLANR(IPKS+,IPKS-)(OC192 link)

Polytechnic University,ECE Department23 Evaluation-Simple Scheme  [UNC] Sampling rate:1/4 Bit array size:128Kb –Area1:false positives Area II: false negative

Polytechnic University,ECE Department24 Evaluation-Advanced Scheme  [UNC]2D Bit array A: 128KB(64*16,384) sampling rate:1

Polytechnic University,ECE Department25 Estimation Accuracy

Polytechnic University,ECE Department26 Summary  Monitoring at high speed is challenging  Network Data Streaming –Keep up with the line speed –Include more pertinent information  Employ other fields achievements

Polytechnic University,ECE Department27 Q&A Q&A