Download presentation
Presentation is loading. Please wait.
1
1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer Science Department, Northwestern University
2
2 Online Change Detection Network anomalies are common –Flash crowds, failures, DoS, worms, … Online Detection over Data Streams Data Stream: key/update pairs (k,u) –Heavy hitters (lots of prior work) –Heavy changes
3
3 -first to detect flow-level heavy changes in massive data streams at network traffic speeds. [Krishnamurthy, Sen, Zhang, Chen, 2003] k-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003] 1 j H 01K-1 … … …
4
4 [Krishnamurthy, Sen, Zhang, Chen, 2003] k-ary sketch [Krishnamurthy, Sen, Zhang, Chen, 2003] 1 j H 01K-1 … … … hj(k)hj(k) hH(k)hH(k) h1(k)h1(k) Update (k, u): T j [ h j (k)] += u (for all j) Estimate v(S, k): sum of updates for key k
5
5 ? ?
6
6 ? ? Main problem –Cannot efficiently report keys with heavy change Our Contribution –Determine set of keys that have “large” estimates in sketch Requires very little space: –E.g. 5 hash tables with 16 K buckets = 80 KB –Fits in high speed memory
7
7 1 2 3 5 4 “Heavy” Input: Output: Set of keys that hash to heavy buckets in majority (or all) hash tables -Sketch -Threshold Reverse Sketch Problem
8
8 Outline Streaming data recording k-ary sketch value key Heavy change detection k-ary sketch heavy change keys change threshold fast slow Modular hashing IP mangling Reverse Hashing Algorithms Improve Heavy Change Detection
9
9 Intersect A 1, A 2, A 3, A 4, A 5 Taking Intersections H = 5 K = 2 12 #keys = 2 32 (IP addresses) E[false positives] << 1
10
10 The problem with simple intersection Why is this difficult ? Each set A i can be very large ! H = 5 K = 2 12 #keys = 2 32 (IP addresses) |A 1 | = 2 32 / 2 12 = 2 20
11
11 The problem with simple intersection Why is this difficult ? Each set A i can be very large ! Solution: Modular hashing
12
12 Modular hashing reduces the set size 32 bits 8 bits 10010100101010111001010110100011 010 110 001 101 h() 12 bits
13
13 Modular hashing reduces the set size 32 bits 8 bits 10010100101010111001010110100011 h 1 ()h 2 ()h 3 ()h 4 () 010110001101 010 110 001 101 Greatly reduces size of reverse mapped sets
14
14 Modular hashing reduces the set size 32 bits 8 bits 10010100101010111001010110100011 h 1 ()h 2 ()h 3 ()h 4 () 010110001101 010 110 001 101 Greatly reduces size of reverse mapped sets 2 8 /2 3 = 2 5
15
15 1 2 3 5 4 b1b1 b2b2 b4b4 b5b5 b3b3 A 1 : 2 5 * 2 5 * 2 5 * 2 5 Modular hashing reduces the set size Intersection: Only 32 elements per partition
16
16 1 2 3 5 4 b1b1 b2b2 b4b4 b5b5 b3b3 A 1 : 2 5 * 2 5 * 2 5 * 2 5 A 2: 2 5 * 2 5 * 2 5 * 2 5 Modular hashing reduces the set size Intersection: Only 32 elements per partition
17
17 1 2 3 5 4 b1b1 b2b2 b4b4 b5b5 b3b3 b3b3 b1b1 b2b2 b4b4 b5b5 Handling Multiple Intersections… 2 H different intersections Much more difficult - Need sophisticated Reverse Hashing algorithms ( see tech report )
18
18 Problem: Too many collisions 129.105.56.23 129.105.56.28 129.105.56.109 129.105.56.35 129.105.56.98... 7. 4. 0. * 32 bits 12 bits
19
19 Problem: Too many collisions 129.105.56.23 129.105.56.28 129.105.56.109 129.105.56.35 129.105.56.98... 7. 4. 0. * 32 bits 12 bits IP Mangling Solution:
20
20 IP-mangling
21
21 Invertible Modular Linear Equation f(x) a·x mod n To be invertible: Must be relatively prime a is odd, chosen randomly
22
22 Modular Hashing Optimal Hashing
23
23 Modular Hashing Modular Hashing with IP Mangling Optimal Hashing
24
24 Recap: Streaming data recording reversible k-ary sketch value stored value Modular hashing IP mangling key Heavy change detection reversible k-ary sketch Reverse hashing Reverse IP mangling heavy change keys change threshold
25
25 Evaluation Traffic traces from Northwestern University edge router –Each 5 min interval average traffic 7.5 GB in each interval Compared with Ground Truth 6 hash tables, 4K buckets each, totally 192KB memory Up to 140 true heavy change keys in 1.5 seconds –Over 95% TPP –Less than 2% FPP All missing changes are due to boundary effects
26
26 Conclusions/ Future Work Sketches: efficient summary structures Our contribution: Reversible Sketches –efficient online detection of keys with heavy changes Work in Progress (see tech report) Improved reverse hashing Statistical guarantee on detection accuracy More advanced applications: –Hierarchical change detection E.g. 129.105.100.* shows a big change !
27
27 See tech report for more! http://list.cs.northwestern.edu Thank you !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.