Presentation is loading. Please wait.

Presentation is loading. Please wait.

Polytechnic University,ECE Department1 Detection of “Hot Spots” Paper Title : Joint Data Streaming and Sampling Techniques for Detection of Super Sources.

Similar presentations


Presentation on theme: "Polytechnic University,ECE Department1 Detection of “Hot Spots” Paper Title : Joint Data Streaming and Sampling Techniques for Detection of Super Sources."— Presentation transcript:

1 Polytechnic University,ECE Department1 Detection of “Hot Spots” Paper Title : Joint Data Streaming and Sampling Techniques for Detection of Super Sources and Destinations Liang,Chao

2 Polytechnic University,ECE Department2 Motivation  “Hot spots” in the Internet –Super Source (large fan-out) Infected hosts by worm (Slammer worm) –Super Destination (large fan-in) DDoS victim  Internet attacks increasing in severity –Network security monitoring  Challenges High packets arrival rate Speed requirement of RAM (DRAM vs SRAM) Impractical per-flow state maintenance

3 Polytechnic University,ECE Department3 How to find the needle in the haystack  IP Flow –Abstraction: set of packets identified with same address, ports, etc. –Flow label: Source-destination pair  General Problem: Heavy distinct-hitters –Given a stream of flow label pairs, find all the src that are paired with a large number of distinct destination. –Detect super destination: Reverse the flow label flow 1flow 2flow 3

4 Polytechnic University,ECE Department4 Weapons  Previous Techniques –Flow state maintenance –Probabilistic counting –Bloom Filters –Multi-resolute bitmap –……  This paper Sampling Network Data streaming

5 Polytechnic University,ECE Department5 Paper  Qi Zhao, Abhishek Kumar, Jun Xu, “Joint Data Streaming and Sampling Techniques for Detection of Super Sources and Destinations”, IMC 2005

6 Polytechnic University,ECE Department6 Outline of the rest of the talk  Introduction of one previous work –Traditional hash-based flow sampling  Main approach –Simple scheme –Advanced scheme  Evaluation  Summary

7 Polytechnic University,ECE Department7 Traditional hash-based flow sampling  Flow sampling –Sample flows with a certain percentage p Hash function maps flow label to a value uniformly distributed in [0,1) H (flow label)<p, then sample the flow  Hash Table –HT1.Detect and discard duplicate ones Access the element with index by hashing flow label Element: list of flow label pairs –HT2.Count flow numbers Access the element with index by hashing srcIP Element: list of pairs

8 Polytechnic University,ECE Department8 Traditional hash-based flow sampling  Fan-out Calculation –Threshold Judge to report the super source –Estimation to compensate sampling Ē =E*(1/p)  Performance Analysis –Key Ineffective Reason - Low sampling rate The update cost of hash tables (In DRAM) Elephant flows influence –Performance bottleneck: Query of the first hash table –Result The sampling rate p<< Hs / Tr – Hs : operating speed of hash table – Tr : arrival rate of traffic Estimation error scale by 1/p p is too slow!

9 Polytechnic University,ECE Department9 Contribution of this paper  Network Data Streaming –Process each and every incoming packet in real-time –Employ a small and fast memory –Maintain only the most pertinent information  Two schemes –Simple scheme : filtering after sampling –Advanced scheme : separation of counting and identity gathering Include more information

10 Polytechnic University,ECE Department10 Simple Scheme System  Filtering after sampling System  Data Streaming module –Replace the hash table –Final goal: improve the sampling rate

11 Polytechnic University,ECE Department11 Simple Scheme – Data Streaming Module  How to realize –Employing bit array to label new flow Bit array G: w bits Hash function: maps to a value uniformly distributed in [1,w] –Employ SRAM (static random access memory) packet H( ) 012 i 0 w-1 1 flow label

12 Polytechnic University,ECE Department12 Simple Scheme - Estimation  Hash collision in data streaming –Different flows have same index of G –Miss the update of the hash table  Compensation of the collision –when the ith new flow arrival Variable u: to keep track of the number of “0” in G Variable i : hash result of the new flow P(G[i]=0) = u/w –Compensate the hash collisions by adding w/u  Unbiased Estimation of count –Hash table updated by K flows

13 Polytechnic University,ECE Department13 Simple Scheme - Algorithm Compensation Calculation

14 Polytechnic University,ECE Department14 Simple Scheme - Analysis  Unbiased estimator of fan-out  Saturation Avoidance Number of ‘0’ element Probability to be recorded –Minimum of ‘0’ element typically set around w/2 (half full) –Two sets of arrays and hash tables operated alternatively  Sampling rate improved –Affordable SRAM Little memory consumption to support high speed links –Streaming speed Poisson alike update times of the hash table Efficient hardware implementation of hash function All operations in data streaming module can be finished in about 10ns Bottleneck!

15 Polytechnic University,ECE Department15 Advanced Scheme - System Record source identity (e.g.. source IP) Record flow information to array in real-time Use the source identity(2) to look up the array(1) to estimate offline

16 Polytechnic University,ECE Department16 Advanced Scheme – Streaming algorithm  2D bit array A(m,n)  Four hash functions –One to get row number (range [1,m]) –Three to get column number (range [1,n]) this case k=3

17 Polytechnic University,ECE Department17 Advanced Scheme – Streaming module Row collision Column collision Why k=3?

18 Polytechnic University,ECE Department18 The Linear-Time probabilistic counting algorithm  Idea from Database field: counting the number of unique values in the presence of duplicates  Estimation of distinct flow number –m : column size –n : total number of flow –Aj : the jth element of column –Un: the number of element whose value is “0” j

19 Polytechnic University,ECE Department19 Joint relation calculation  The distinct values in the join of two relations –AB=A+B-AUB –A->G1 B->G2  Estimate them by linear counting D based on G –AB=D(G1)+D(G2)- D(G1UG2) Note: Cannot directly calculate G1G2 cause different space AпB G1пG2 AUB G1UG2

20 Polytechnic University,ECE Department20 Advanced Scheme – Estimation module  Computing the join selectivity in three columns(k=3) –U: Bitwise-OR  Avoid two sources both hashed to the same k columns –S: total number distinct sources –n: column number –The probability of collision drop to 0.002 –When n=16,000, S=100,000, k=3

21 Polytechnic University,ECE Department21 Advanced Scheme – Identity module  Purpose –Capture the identities of potential super sources –Write data into DRAM in real-time  Identity collection –Estimate the corresponding fan-out as input data  Why DRAM? –Replace expensive hash table operation –Sequential writes can be very fast 100% and 25% recording for OC-192 and OC-768

22 Polytechnic University,ECE Department22 Evaluation  Real internet traffic traces –UNC(1 Gbps),USC,NLANR(IPKS+,IPKS-)(OC192 link)

23 Polytechnic University,ECE Department23 Evaluation-Simple Scheme  [UNC] Sampling rate:1/4 Bit array size:128Kb –Area1:false positives Area II: false negative

24 Polytechnic University,ECE Department24 Evaluation-Advanced Scheme  [UNC]2D Bit array A: 128KB(64*16,384) sampling rate:1

25 Polytechnic University,ECE Department25 Estimation Accuracy

26 Polytechnic University,ECE Department26 Summary  Monitoring at high speed is challenging  Network Data Streaming –Keep up with the line speed –Include more pertinent information  Employ other fields achievements

27 Polytechnic University,ECE Department27 Q&A Q&A


Download ppt "Polytechnic University,ECE Department1 Detection of “Hot Spots” Paper Title : Joint Data Streaming and Sampling Techniques for Detection of Super Sources."

Similar presentations


Ads by Google