Presentation is loading. Please wait.

Presentation is loading. Please wait.

A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan U of Illinois, Urbana Champaign Tim Sherwood UC, Santa Barbara.

Similar presentations


Presentation on theme: "A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan U of Illinois, Urbana Champaign Tim Sherwood UC, Santa Barbara."— Presentation transcript:

1 A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan U of Illinois, Urbana Champaign Tim Sherwood UC, Santa Barbara

2 Outline Why String Matching –Matching against multiple strings The Aho-Corasick Algorithm –The Devil in the Constants A Bit-Split Algorithm Hardware Design and Analysis Conclusions

3 To Protect and Serve Our machines are constantly under attack Cannot rely on end users, we need networks which actively defend themselves. This requires the protection system to be able to operate at 10 to 40 Gb/s. (We aim at current and next generation networks.) IDS/IPS are promising ways of providing protection Market for such systems: $918.9 million by the end of 2007. Snort: an widely accepted open source IDS

4 Our Contributions String Matching Architecture: –0.4MB and 10Gbps for Snort rule set ( >10,000 characters) Bit-Split String Matching Algorithm –Reduces out edges from 256 to 2. Performance/area beats the best techniques we examined by a factor of 10 or more.

5 Scanning for Intrusions Most IDS define a set of rules. A string defines a suspicious transmission. We are not building a full IDS, rather building the primitives from which full systems can be built CodeRed worm: web flow established uricontent with “/root.exe” Traffic InTraffic Out Scan Software IDS

6 Multiple String Matching The multiple string matching algorithm: –Input: A set of strings/patterns S, and a buffer b –Output: Every occurrence of an element of S in b –Extra constraint: b is really a stream How to implement: Option 1) search for each string independently Option 2) combine strings together and search all at once A B A string can be anywhere in the payload of a packet. A B D F C A B Input: A BC A Strings:

7 Why hardware Snort: >1,000 rules, growing at 1 rule/day or more Active research into automated rule building Strings are not limited to be just [a-z]+ We need a high speed string matching technique with stringent worst case performance. Many algorithms are targeted for average case performance. Aho-Corasick can scan once and output all matches. But it is too big to be on-chip.

8 Outline Why String Matching –Matching against multiple strings The Aho-Corasick Algorithm –The Devil in the Constants A Bit-Split Algorithm Hardware Design and Analysis Conclusions

9 The Aho-Corasick Algorithm Given a finite set P of patterns, build a deterministic finite automaton G accepting the set of all patterns in P.

10 An AC Automaton Example Example: P = {he, she, his, hers} 0 1 h 2 9 8 6 3 4 57 e s i h s ers Initial State Accepting State State Transition Function h S h h h h h S S S S S S i h r h The Construction: linear time. The search of all patterns in P: linear time (Edges pointing back to State 0 are not shown).

11 Linear Time: So what’s the problem … … … … … 16,384 2 1 0 2553210 256 Next State Pointers How to implement it on chip? Problem: Size too big to be on-chip –~ 10,000 nodes –256 out edges per node –Requires 16,384*256*14 = ~10MB Solution: partition into small state machines –Less strings per machine –Less out edges per machine

12 Outline Why String Matching –Matching against multiple strings The Aho-Corasick Algorithm –The Devil in the Constants A Bit-Split Algorithm Hardware Design and Analysis Conclusions

13 Our Main Idea: Bit-Split Partition rules (P) into smaller sets (P 0 to P n ) Build AC state-machine for each subset For each DFA P i, rip state-machine apart into 8 tiny state-machines (B i0 through B i7 ) Each of which searches for 1 bit in the 8 bit encoding of an input character –O–Only if all the different B machines agree can there actually a match

14 Binary Encoding P 0 = { he, she, his, hers }

15 An example of Bit-Split P 0 = { he, she, his, hers } 0 1 h 2 9 8 6 3 4 57 e s i h s e rs h S h h h h h S S S S S S i h r h (Edges pointing back to State 0 are not shown). 0000 0000 0001 0110 1000 b0 {0} P0P0 B 03 0 b1{ }0 1 b2{ },10,3 0001 0000 0111 0011 0 1 { } 0,3 { } 0,1,2,6 b3 1 b3{0,1,2,6} 0 1 b4{0,1,4} b6{0,1,2,5,6} b5{0,3,7,8} b7{0,3,9} 0 1 0 0 0 0 0 1 1 1 1 1 1

16 Compact State Set P 0 = { he, she, his, hers } 0 1 h 2 9 8 6 3 4 57 e s i h s e rs h S h h h h h S S S S S S i h r h (Edges pointing back to State 0 are not shown). b0 { } P0P0 B 03 0 b1{ } 1 b2{ } 1 b3{ 2 } 0 b4 { } b6{ 2,5 } b5{7} b7{9} 0 1 0 0 0 0 0 1 1 1 1 1 1

17 An example of Bit-Split P 0 = { he, she, his, hers } (Edges pointing back to State 0 are not shown). P0P0 0 1 h 2 9 8 6 3 4 57 e s i h s e r s h S h h h h h S S S S S S i h r h B 03 b0 {} b1{} b2{} b3{2} b4 {} b6{2,5} b5{7} b7{9} 0 1 1 0 0 1 1 0 0 0 1 1 1 0 0 1 B 04 1 b8{2,7} b5 {} b0 {} b1{} b2{} b4{2} b3 {} b6{2,5} b9{9} 0 1 b7 {} 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

18 Nice Properties The number of states in B ij is rigorously bounded by the number of states in P i No exponential blow up in state Linear construction time Possible to traverse multiple edges at a time to multiply throughput

19 0 1 h 2 9 8 6 3 4 57 e s i h s e r s h S h h h h h S S S S S S i h r h Matching on the example hxhers Only scan the input stream once. Input stream:

20 0 1 h 2 9 8 6 3 4 57 e s i h s e r s h S h h h h h S S S S S S i h r h Matching on the example P0P0 B 03 b0 {} b1{} b2{} b3{2} b4 {} b6{2,5} b5{7} b7{9} 0 1 1 0 0 1 1 0 0 0 1 1 1 0 0 1 B 04 1 b8{2,7} b5 {} b0 {} b1{} b2{} b4{2} b3 {} b6{2,5} b9{9} 0 1 b7 {} 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 hxhe01001110 How do you “combine” the results from the different state machines? Only if all the state machines agree, is there actually a match. 2

21 How to Implement The AC state machine is equivalent to the 8 tiny state machines. The 8 tiny state machines can run independently, which means in parallel Intersection done with bit-wise AND. 8 is intuitive but not optimal How to build a system to implement this algorithm? –Our algorithm makes it feasible to be on-chip

22 A Hardware Implementation A rule module is equivalent to an AC state machine Rule modules, tiles are structurally equivalent All full match vectors are concatenated to indicate which strings are matched One tile stores one tiny bit-split state machine 8 4 Next State Pointers Partial Match Vector 0 1 2 255 … 3 decoder Input Current State 2 bits from each byte Partial Match Vector Config Data Output Latch 4:1 Mux 16 State Machine Tile Rule Module 0 Tile 0 Tile 1 Tile 3 Tile 2 Full Match Vector 2-bit Input [0:1] Partial Match Vector 16 8 [6:7] [2:3] [4:5] Control Block Rule Module 1 Byte from Payload 8 … 2 Rule Module N 8 8 Complete Set of Matches for All Rules String Match Engine 16

23 An efficient Implementation 00011011PMV 001000000 10200 203001000 304001110 404001111 5 6 7 8 9 00011011PMV 010200000 11030 21050 31650 470201000 504500000 670201100 790300000 810300010 910300001 00011011PMV 010020000 11302 24002 310561000 417020000 51008 640020010 710561100 840020001 9 00011011PMV 000120000 10032 20042 300351000 400620000 500470010 600351100 700420001 8 9 Tile 0Tile 2 Tile 1Tile 3 Cycle 3e011001 Cycle 2h0110 00 Cycle 1x01111000 Cycle 0h0110 00 h h x e h h x e h h e x h x h e e1100 h0000 x h e1111 h1110 x1000 h0000 e1000 h0000 x h e1000 h0000 x h Cycle 3 + P1000 Cycle 2 + P0000 Cycle 1 + P0000 Cycle 0 + P0000 2 2 2 2

24 Performance of Hardware Key Metric: Throughput*Character/Area

25 Related Work Software based –Good for ~100Mb/s, common case FPGA-based –Many schemes map rules down to a specialized circuit Near optimal utilization of hardware resources –Implementing state machines on block-RAMs [Cho and Mangione- Smith] –Concurrent to our work: mapping state machines to on-chip SRAM [Aldwairi et. al.] –Bloom filters [Dharmapurikar et al.] Excellent filter in the common case TCAM-based –Require all patterns to be shorter or equal to TCAM width –Cutting long patterns: 2Gbps with 295KB TCAM [Yu et. al.]

26 Conclusions New Tile-based Architecture –0.4MB and 10Gbps for Snort rule set ( >10,000 characters) –Possible to be used for other applications, e.g. IP lookups, packet classification. New Bit-split Algorithm: –General purpose enough for many other applications, e.g. spam detection, peephole optimization, IP lookups, packet classification, etc. –Feasible to be implemented on other tile-based architecture.

27 Thank you! Questions?

28 Backup Slides

29 An efficient Implementation 00011011PMV 001000000 10200 203001000 304001110 404001111 5 6 7 8 9 00011011PMV 010200000 11030 21050 31650 470201000 504500000 670201100 790300000 810300010 910300001 00011011PMV 010020000 11302 24002 310561000 417020000 51008 640020010 710561100 840020001 9 00011011PMV 000120000 10032 20042 300351000 400620000 500470010 600351100 700420001 8 9 Tile 0Tile 2 Tile 1Tile 3 Cycle 3e011001 Cycle 2h0110 00 Cycle 1x01111000 Cycle 0h0110 00 h h x e h h x e h h e x h x h e e1100 h0000 x h e1111 h1110 x1000 h0000 e1000 h0000 x h e1000 h0000 x h Cycle 3 + P1000 Cycle 2 + P0000 Cycle 1 + P0000 Cycle 0 + P0000 2 2 2 2


Download ppt "A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan U of Illinois, Urbana Champaign Tim Sherwood UC, Santa Barbara."

Similar presentations


Ads by Google