Presentation is loading. Please wait.

Presentation is loading. Please wait.

Variable-Stride Multi-Pattern Matching For Scalable Deep Packet Inspection Nan Hua 1, Haoyu Song 2, T. V. Lakshman 2 1 Georgia Tech, 2 Bell Labs, Alcatel-Lucent.

Similar presentations


Presentation on theme: "Variable-Stride Multi-Pattern Matching For Scalable Deep Packet Inspection Nan Hua 1, Haoyu Song 2, T. V. Lakshman 2 1 Georgia Tech, 2 Bell Labs, Alcatel-Lucent."— Presentation transcript:

1 Variable-Stride Multi-Pattern Matching For Scalable Deep Packet Inspection Nan Hua 1, Haoyu Song 2, T. V. Lakshman 2 1 Georgia Tech, 2 Bell Labs, Alcatel-Lucent April 12, 2015

2 All Rights Reserved © Alcatel-Lucent 2009 2 | IEEE INFOCOM | April 2009 Introduction  Deep Packet Inspection (DPI)  Stateful inspection on packet header + packet payload  Network Intrusion Detection & Prevention, Lawful Inspection, Censorship, Quality of Service …  Focus of this work  Fixed String Pattern Matching  Why important? –Key component of signature-based DPI system –The basis for advanced inspection –Performance bottleneck  Requirement –High speed, real time in-line processing –Low memory storage and bandwidth consumption –Low false positive rate and low miss rate –Resilient to the worst case scenarios

3 All Rights Reserved © Alcatel-Lucent 2009 3 | IEEE INFOCOM | April 2009 Classical Algorithm: Aho-Corasick DFA (1975)  Set the foundation for most of the latest multi-pattern matching algorithms  Consumes one byte/character per lookup cycle  10GbE/OC192  ~1 gigabytes/sec.  Too many state transitions even for such a small set  state fan-out = alphabet size init state accept state Failure transitions back to init state are not shown. String set: {he, his, him, her}

4 All Rights Reserved © Alcatel-Lucent 2009 4 | IEEE INFOCOM | April 2009 Increasing Throughput Through Parallelism  Multiple parallel load-balancing search engines  Memory Bandwidth Intensive  Complex packet scheduler  Overall cost depends on each single engine  Make a single search engine scalable  Simple pipeline does not work due to the DFA feedback path  Superscalar & Multi-threading works with complex packet scheduler  Examine multiple bytes or characters per lookup step  Our goal: Improving throughput without exploding the memory  Better state machine implementation  Better (on-chip and off-chip) memory organization

5 All Rights Reserved © Alcatel-Lucent 2009 5 | IEEE INFOCOM | April 2009 A Naive realization of multi-byte pattern matching s3 : tel s5 : phon e s6 : elep hant s4 : tele phon e s1 : tech nica l s2 : tech nica lly s3 : tel s5 : phone s6 : elephant s4 : telephone s1 : technical s2 : technically q0q0 q1q1 q5q5 tech nica s3, q 2 q6q6 tele phon q3q3 hant q4q4 S 6 q 7 elep s3s3 tel S 4, s 5 e s5s5 e s1s1 l lly S 1, s 2 Input alignment problem. e.g. it can match “ phone ” but not “ iphone ” Still one character per lookup, but speedup can be achieved by …

6 All Rights Reserved © Alcatel-Lucent 2009 6 | IEEE INFOCOM | April 2009 Deploying Multiple Multi-byte Search Engines  Replicate the table for different shift offsets.  Waste memory storage  One lookup for each offset  Waste memory bandwidth  Many previous work can be classified as using this approach: ANCS ’ 05, JSAC ’ 06 … technxyzicallyab

7 All Rights Reserved © Alcatel-Lucent 2009 7 | IEEE INFOCOM | April 2009 Amending Bandwidth with Storage (ISCA ’ 06)  Combining all possible offsets into one state machine  leading to memory explosion –state fan-out = Sⁿ, S is the alphabet size and n is the stride DFA for one pattern: “ abba ” in alphabet {a, b}

8 All Rights Reserved © Alcatel-Lucent 2009 8 | IEEE INFOCOM | April 2009  What is the problem of the naive approach?  The segments within source and target are not aligned Key Idea of Variable Stride DFA (VS-DFA)  How does human recognize string patterns in natural language?  Using words as atomic units separated by space and punctuation this talk is interesting! I thinkthistalkisboring! technxyzicallyab Source (data flow) technically Signature (to be matched)

9 All Rights Reserved © Alcatel-Lucent 2009 9 | IEEE INFOCOM | April 2009  Winnowing [S. Schleimer, et al, SIGMOD ’ 03]  extract documents ’ signature for similarity comparison  First: hash every k characters, say, k = 2  Second: select the max hash value within a w-byte sliding window, say, w = 3  Third (our extension): partition the string into blocks at the positions of chosen values Identifying Atomic Units using Winnowing technxyzicallyab 514620576179149787517616l4916810554 99 514620576179149787517616l4916810554 99 149 51

10 All Rights Reserved © Alcatel-Lucent 2009 10 | IEEE INFOCOM | April 2009 Segmenting Strings to Blocks using Winnowing  Each pattern string is divided into a head block, one or more core blocks, and a tail block  The core blocks are context independent  The head block and the tail block are context dependent  Some short pattern can be coreless or indivisible  Key idea: Using the core blocks to identify the pattern and then using the head and tail to verify the matching head block conf id r ent --- id id |ent ent|ica id | ic|ulo|u (empty-core) (indivisible) s4: s5: s3: s1: s6: s7: ent ial l s ire --- confident confidential identical ridiculous entire set s4: s5: s3: s1: s6: s7: winnowed core blocks tail block auth ent|icas2:te authenticates2:

11 All Rights Reserved © Alcatel-Lucent 2009 11 | IEEE INFOCOM | April 2009 Building the Variable-Stride DFA q0q0 id | l s2s2 s3s3 auth | te s4s4 conf | ent s5s5 conf | ial s1s1 r|sr|s s6s6 set s7s7 Short patterns are handled by TCAM ent | ire head string conf id r ent --- id id |ent ent|ica id | ic|ulo|u (empty-core) (indivisible) s4: s5: s3: s1: s6: s7: ent ial l s ire --- core string tail string auth ent|icas2:te Compiled ic q2q2 ulo id ent q1q1 ica q 12 q 15 q 14 q 11 q3q3 u ica A difference from Aho- Corasick is that sometimes this jump could be removed

12 All Rights Reserved © Alcatel-Lucent 2009 12 | IEEE INFOCOM | April 2009 Pattern Matching System using VS-DFA Data Stream (Payload) Blocks Queue t x y z e c h n i l c a l Block-based State Machine One Block per cylce state Match Result technxyz icallyab connecti Winnowing Module Multi-bytes per cycle Throughput depends on the state machine

13 All Rights Reserved © Alcatel-Lucent 2009 13 | IEEE INFOCOM | April 2009  VS-DFA comprises two tables: the State Transition Table (STT) and the Match Table (MT) State Machine Implementation StateHeadTail q 14 confent q 15 confial q 12 authte q 11 rs 1 3 Depth 2 2 q 12 idl2 (b) Match Table (MT) Start State block End State q0q0 idq 14 q0q0 entq1q1 q 14 icq2q2 q3q3 uq 11 q 14 entq 15 q1q1 icaq 12 q 15 icaq 12 Hash Key Value Start Transitions (a) State Transition Table (STT) q2q2 uloq3q3  Implemented as efficient hash tables

14 All Rights Reserved © Alcatel-Lucent 2009 14 | IEEE INFOCOM | April 2009 Using TCAM to Handle Short Patterns  The “ empty-core ” pattern could still benefit from the segmentation  An indivisible pattern needs max {w, w+k-2} replications entire tes tes tes tes Head (w bytes) Tail (w+k-2 bytes) Empty-Core Pattern Indivisible Pattern

15 All Rights Reserved © Alcatel-Lucent 2009 15 | IEEE INFOCOM | April 2009 Defending Against the Single-byte blocks  The expected throughput speedup is (w+1)/2  Prone to Denial-of-Service attack  single-byte blocks can lower the throughput  adversaries can easily construct repeated single-byte blocks by sending repeated patterns  We can reduce or even eliminate the single-byte pattern by applying the combination rules on the data stream and pattern at the same time  combining up to w consecutive single-byte blocks into one block  maintaining the block synchronization feature –see paper for details

16 All Rights Reserved © Alcatel-Lucent 2009 16 | IEEE INFOCOM | April 2009 Evaluation Pattern Sets & Memory Efficiency Snort-full and ClamAV-full also includes the fixed strings extracted from the Regular Expressions (in snort) or the advanced rules (in ClamAV)

17 All Rights Reserved © Alcatel-Lucent 2009 17 | IEEE INFOCOM | April 2009 Evaluation Results: Tradeoffs of w and k  Larger w or k results in smaller memory  Larger w or k results in larger TCAM  Larger w results in higher throughput results for snort-fixed. results for ClamAv is similar

18 All Rights Reserved © Alcatel-Lucent 2009 18 | IEEE INFOCOM | April 2009 Conclusion & Future Work  Multi-pattern matching is a key building block of a DPI system  VS-DFA can process multiple bytes per step with small memory size and memory bandwidth consumption  A single VS-DFA search engine can support 10Gbps+ throughput  Future Work  Find other segmentation algorithms instead of Winnowing that are more suitable for our application  Use larger stride for higher throughput without incurring the short pattern penalty  Extend the algorithm to support regular expression matching


Download ppt "Variable-Stride Multi-Pattern Matching For Scalable Deep Packet Inspection Nan Hua 1, Haoyu Song 2, T. V. Lakshman 2 1 Georgia Tech, 2 Bell Labs, Alcatel-Lucent."

Similar presentations


Ads by Google