Download presentation
Presentation is loading. Please wait.
Published byJoan Matthews Modified over 9 years ago
1
Space-Time Tradeoffs in Software-Based Deep Packet Inspection Anat Bremler-Barr Yotam Harchol ⋆ David Hay IDC Herzliya, Israel Hebrew University, Israel. OWASP Israel 2011 Parts of this work were supported by European Research Council (ERC) Starting Grant no. 259085 ⋆ Supported by the Check Point Institute for Information Security (Was also presented in IEEE HPSR 2011)
2
2 Outline Motivation Background New Compression Techniques Experimental Results Conclusions
3
Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Network Intrusion Detection Systems Classify packets according to: – Header fields: Source IP & port, destination IP & port, protocol, etc. – Packet payload (data) 3 Internet IP packet IP packet Deep Packet Inspection Motivation
4
BackgroundNew Compression TechniquesExperimental ResultsConclusions Deep Packet Inspection (D)RAM Cache Memory High Capacity Slow Memory Locality-based Low Capacity Fast Memory The environment: Motivation 4
5
BackgroundNew Compression TechniquesExperimental ResultsConclusions Our Contributions Literature assumption: try to fit data structure in cache Efforts to compress the data structures Our paper: Is it beneficial? In reality, even in non-compressed implementation, most memory accesses are done to the cache BUT One can attack the non-compressed implementation by reducing its locality, getting it out of cache - and making it much slower! How to mitigate this attack? Compress even further - our new techniques: 60% less memory 5 Motivation
6
BackgroundNew Compression TechniquesExperimental ResultsConclusions Complexity DoS Attack Find a gap between average case and worst case Engineer input that exploits this gap Launch a Denial of Service attack on the system 6 Internet Real-Life Traffic Throughput Motivation
7
7 Outline Motivation Background New Compression Techniques Experimental Results Conclusions
8
Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Aho-Corasick Algorithm Build a Deterministic Finite Automaton Traverse the DFA, byte by byte Accepting state pattern found Example: {E, BE, BD, BCD, CDBCAB, BCAA} 8 [Aho, Corasick; 1975] s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 s 14 s 13 s6s6 s8s8 s9s9 s 10 s 11 C C E D B E D D B C A B A A B E CB E C B E C D E B C D E C E B C E B C E B C E C B B Background B BCDBCAB Input: s0s0 s 12 s2s2 s5s5 s6s6 s9s9 s 10 s 11
9
Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Aho-Corasick Algorithm Naïve implementation: Represent the transition function in a table of |Σ|×|S| entries – Σ: alphabet – S: set of states Lookup time: one memory access per input symbol Space: In reality: 70MB to gigabytes… 9 [Aho, Corasick; 1975] Background ABCDE S0S0 02701 S1S1 02701 S2S2 02543 S3S3 02701 S4S4 02701 S5S5 132761 S6S6 09701 S7S7 02781 S8S8 09701 :
10
Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Potential Complexity DoS Attack 1.Exhaustive Traversal Adversarial Traffic – Traverses as much states of the automaton – Bad locality - Bad for naïve implementation (will not utilize cache) 10 s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 C C E D B ED s 14 s 13 s6s6 D s8s8 B s9s9 C s 10 A s 11 B A A Background
11
Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Alternative Implementation Failure transition goes to the state that matches the longest suffix of the input so far Lookup time: at most two memory accesses per input symbol (via amortized analysis) Space: at most, # of symbols in pattern set, depends on implementation 11 [Aho, Corasick; 1975] B E CB E C B E C D E B C D E C E B C E B C E B C E B C B B s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 s 14 s 13 s6s6 s8s8 s9s9 s 10 s 11 C C E D B E D D B C A B A A Forward Transition Failure Transition Background s 10 s5s5 s7s7 s0s0 s1s1
12
Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Potential Complexity DoS Attack 1.Exhaustive Traversal Adversarial Traffic -Traverses as much states of the automaton -Bad locality - Bad for naïve implementation (will not utilize cache) 2.Failure-path Traversal Adversarial Traffic -Traverses as much failure transitions -Bad for failure-path based automaton (as much memory accesses per input symbol) 12 s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 C C E D B ED s 14 s 13 s6s6 D s8s8 B s9s9 C s 10 A s 11 B A A Background
13
Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 C C E D B ED s 14 s 13 s6s6 D s8s8 B s9s9 C s 10 A s 11 B A A s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 C C E D B ED s 14 s 13 s6s6 D s8s8 B s9s9 C s 10 A s 11 B A A Prior Work: Compress the State Representation 13 symbolABCDE forward:136 Lookup Table 7 failure: False match: ABCDE 10010 Bitmap Encoded Bitmap: Length=|Σ| forward:136 7 failure: False match: symbolAD forward:136 Linear Encoded 7 failure: False match: 2 size: Background Experimental ResultsConclusions Can count bits using popcnt instruction
14
14 Outline Motivation Background New Compression Techniques Experimental Results Conclusions
15
Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Path Compression One-way branches can be represented using a single state – Similarly to PATRICIA tries Problem: Incoming failure transitions Solution: Compress only states with no incoming failure transitions 15 New Compression Techniques s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 s 14 s 13 s6s6 s8s8 s9s9 s 10 s 11 C C E D B E D D B C A B A A s0s0 s7s7 s 12 s1s1 s2s2 s3s3 s5s5 s4s4 s 14 s 13 s6s6 s8s8 s9s9 s 10 s 11 C C E D B E D D B C A B A A s0s0 s7s7 s1s1 s2s2 s3s3 s5s5 s4s4 s 14 s 13 s6s6 s8s8 s9's9' C C E D B E D D BCAB A A (B) (BC) (BCA) (BCAB) Tuck et al. Our Path Compression 100% 75% 2004
16
Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Leaves Compression By definition, leaves have no forward transitions Their single purpose is to indicate a match – We can push this indication up by adding a bit to each pointer – Then, leaves can be eliminated from the automaton - by copying their failure transition up 16 s0s0 s7s7 s1s1 s2s2 s3s3 s5s5 s4s4 C C E D B ED s 14 s 13 s6s6 D s8's8' BCAB s9's9' A A (B) (BC) (BCA) s0s0 s7s7 s2s2 s5s5 C C E* D B D* s 13 D* BCAB* A A* (B) (BC) (BCA) E* s8's8' 3% more space reduction Reduces number of transitions taken s0s0 s7s7 s1s1 s2s2 s3s3 s5s5 s4s4 C C E* D B D* s 14 s 13 s6s6 D* s8's8' BCAB* s9's9' A A* (B) (BC) (BCA) New Compression Techniques
17
Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Pointer Compression 17 In Snort IDS pattern-set, 79% of the fail pointers point to states in depths 0, 1, 2 Add two bits to encode depth of pointer: 00: Depth 0 01: Depth 1 10: Depth 2 11: Depth 3 and deeper DepthPointers 0 (s 0 )13% 131% 235% ≥ 321% New Compression Techniques Depth ≤ 2 16 bits pointer2 bits 11 Depth > 2 16 bits pointer2 bits16 bits pointer
18
Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Pointer Compression 18 DepthPointers 0 (s 0 )13% 131% 235% ≥ 321% New Compression Techniques Tuck et al. Our Path Compression 100% 75% Pointer Comp. 41% 2004 Determine next state from pointer depth: -0: Go to root -1: Use a lookup table using last symbol -2: Use a hash table using last two symbols -≥ 3: Use the stored pointer SymbolState A - B s2s2 C s7s7 D - E s1s1 Depth 1 Lookup Table:Depth 2 Hash Table: hash table Last 2 symbols Next state
19
Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Function Inlining Compressed implementation makes more memory accesses Initial implementation was based on a few functions calling each other Avoiding function calls (by inlining their code) reduced total number of memory reads by 36% 19 New Compression Techniques
20
20 Outline Motivation Background New Compression Techniques Experimental Results Conclusions
21
Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Experimental Setup 21 System 1System 2 TypeMacBook ProiMac CPUCore 2 Duo 2.53GHz dual coreCore i7 2.93GHz quad core L1 Cache:16KB (data, per core) L2 Cache:3MB (shared)256KB (per core) L3 Cache:-8MB (shared) SnortClamAV* Patterns31,09416,710 States in Naïve Implementation 77,182745,303 Test Systems Pattern-Sets Experimental Results Real-life traffic logs taken from MIT DARPA * We used only half of ClamAV signatures for our tests
22
Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Space Requirement 22 Experimental Results 722.14 Memory Footprint [MB] 1.5 2.59
23
Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Experimental Results Memory Accesses per Input Symbol 23
24
Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Experimental Results L1 Data Cache Miss Rate 24 Intel Core 2 Duo (2 cores) 16KB L1 Data Cache 3MB L2 Cache L1 Data Cache Miss Rate
25
Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Experimental Results L2 Cache Miss Rate 25 Intel Core 2 Duo (2 cores) 16KB L1 Data Cache 3MB L2 Cache Real-Life Traffic: 0.7% L2 Cache Miss Rate Real-Life Traffic: 0.7% L2 Cache Miss Rate Adversarial Traffic: 23% L2 Cache Miss Rate Adversarial Traffic: 23% L2 Cache Miss Rate Maximal L2 Miss Rate: 0.06% Maximal L2 Miss Rate: 0.06% L2 Cache Miss Rate
26
Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions Experimental Results Space vs. Time: 26 -86% Our Implementation Naïve Implementation Experimental Results
27
27 Outline Motivation Background New Compression Techniques Experimental Results Conclusions
28
Motivation BackgroundNew Compression TechniquesExperimental ResultsConclusions 28 Naïve Aho-Corasick implementation It is crucial to model the cache in software-based Deep Packet Inspection: Naïve Aho-Corasick implementation has a huge memory footprint, but works well on real-life traffic due to locality of reference Naïve implementation can be easily attacked, making it 7 times slower, even though it has constant number of memory accesses We also show new compression techniques: 60% less memory than best prior-art compression Stable throughput, better performance under attacks Conclusions
29
Questions? Thank you!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.