FPGA Based String Matching for Network Processing Applications Janardhan Singaraju, John A. Chandy Presented by: Justin Riseborough Albert Tirtariyadi.

FPGA Based String Matching for Network Processing Applications Janardhan Singaraju, John A. Chandy Presented by: Justin Riseborough Albert Tirtariyadi ENGG*3050 RCS Winter 2014 March 24, 2014

Content Introduction String Lookup Cache ◦ Architectures ◦ System Interaction ◦ Systems comparison Network Intrusion Detection ◦ Architectures ◦ System Interaction ◦ Implementations Critique 1

Keywords Network processing String matching Content Addressable Memory (CAM) & Cache Bottlenecks Fixed-Size/Non-Fixed-Size keys Cascading, propagating Parallelism 2

Introduction String matching are used in search engines, and network intrusion detection Network processing applications require frequent string matching for specific keywords As networks gets faster, it becomes more difficult for GPP to keep up Bottlenecks are found in memory and also in slow implementation algorithms/methods 3

Current Implementations Software AlgorithmsHardware Implementation Rabin-Karp ◦ Compares hashes of inputs instead of direct character matching Knuth-Morris-Pratt ◦ Character by character matching; skips non-matching Boyer-Moore ◦ Uses pre-computed functions to determine shifting distance Finite automata methods ◦ Translates finite automata graphs to FPGA circuitry CAMs ◦ Caches and lookup tables ◦ Cellular automata ◦ Finite state machines 4

STRING LOOKUP CACHE Section I 5

String Lookup Cache Hardware implementation based on CAMs, cellular automaton and caching Caches retain frequently used values, reducing the need to constantly look up address values Compatible with parallel processing, prefix sharing and pattern partitioning Very high throughputs with low area overhead Drawback of CAMs and hardware caches is the reliance on fixed-size keys ◦ Implementations for non-fixed-size keys requires additional overhead 6

System Architecture 7

Content Addressable Memory Hardware implementation of 2D [associative] arrays/ADT In VLSI, the cells are transistors In an FPGA, storage cells are registers, comparators are XOR gates 8

CAM as Character Match Array (CMA) Takes characters from the network processor on successive clock cycles Columns corresponds to a character in keyword Input character is applied simultaneously to all n columns Column match signal becomes high if all input bits matches Storage cell used to indicate end of keyword 9

Processor Element (PE) Array An array of finite state machines that carries out the approximate match algorithm May contain multiple keywords from the CAM Takes the match signals from the CAM and sets a PE flag which are forwarded to subsequent PEs Evaluates entire input strings in linear time relative to the size of the input stream 10

CMA and PE Interaction 11

Map Table and Outputs The map table takes the PE# and outputs the address to the value or an indirect pointer to the value object The map table has as many slots as there are PEs If words are too long, it can cause holes in the map table 12

System Interaction 13

Implementations Comparison FPGA ImplementationSoftware Implementation Number of characters 2565121024 256 5121024 Slices240348129880 Frequency (MHz) 380.1476.9460.2Time per search (ns) 112813051582 Throughput (Gb/s) 12.215.314.7Throughput (Gb/s) 0.0430.0370.030 Searches per second 254 M318 M307 MSearches per second 887K766K632K Xilinx Virtex-II Pro FPGA (XC2VP230-7)1GHz PowerPC Computer 14

NETWORK INTRUSION DETECTION Section II 15

Network Intrusion Detection The process of identifying and analyzing packets that may contain threats to the organization’s network Time consuming process that grows quickly as defined rule-set or signatures grows large String matching is the most computationally intensive part of the intrusion detection ◦ Every incoming packet is compared against several pre-defined signatures 16

Problems in the CAM Architecture CAM-based designs cannot easily handle regular expressions NIDs signatures are not of a fixed-size ◦ (ie. CAM contains FOO and BAR, input stream is AFOOBARCD. In a 3-character size setup, the comparisons will be made against AFO, OBA and RCD; none of these will match and will slip right through the detection system) CAM arrays are very large in area 17

Proposed Solution Use discrete comparators instead of CAMs ◦ Sacrifices the ability to update signatures dynamically; a fair tradeoff as signatures change relatively infrequently Use p-rows of comparators for parallelism to match several characters in one clock cycle Remove the aligned keyword approach as incoming streams may not be aligned to a certain size boundary 18

System Architecture 19

Processor Architecture 20

Processor Architecture 21

Processor Element Flow Start at the beginning of the signature Based on previous PE and current PE If previous signal and current signal is a match, propagate match signal until end of signature At the end of the signature, if entire signature match, flag the sig_match output 22

Signature Match Processor Example Input string ‘144’ performed over 2 clock cycles ‘1’ is checked in first cycle, sets off a match signal into the SMA ‘4’ is checked in second cycle, sets off match signal into the SMA Match signal for ‘1’ is present from previous clock cycle 23

Signature Match Processor Example The ‘4’ is duplicated, so it simply propagates the first match signal to the second as a carry Since this is the end of the signature, the output is a match due to the propagated match signals && sig_end 24

Address Output Logic In order for the SMP to be useful, we also need to know which signatures caused the match This is handled by the word match buffer, which maintains the position of the signature match When the last character being processed has been reached, the match address output logic begins working on the buffer entries 25

Address Output Logic A binary tree is used for the matching signatures Decoding starts, and a signal is sent to the control circuitry stating there are matches A pointer then propagates up the tree, generating a bit of the final address based on matches Binary trees are fast and efficient, time to process is ~M cycles where M is the number of matches 26

FPGA Implementation As parallelism increases, throughput increases, frequency decreases due to complexity As characters increases, area increases, frequency decreases and throughput decreases 27

Implementation Comparison 28

Critique New terms and unknown works referred to Difficult to follow in some areas due to inconsistencies and how the topic is presented Lots of procedure / methodology on implementation Very detailed works Good examples to strengthen theoretical explanations Implementation data given for comparison purposes 29

QUESTIONS? 30

References All figures and information used in this presentation pulled from the article Janardhan Singaraju, John A. Chandy*, FPGA Based String Matching For Network Processing, ScienceDirect Microprocessors and Microsystems, December 14, 2007 31

FPGA Based String Matching for Network Processing Applications Janardhan Singaraju, John A. Chandy Presented by: Justin Riseborough Albert Tirtariyadi.

Similar presentations

Presentation on theme: "FPGA Based String Matching for Network Processing Applications Janardhan Singaraju, John A. Chandy Presented by: Justin Riseborough Albert Tirtariyadi."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

FPGA Based String Matching for Network Processing Applications Janardhan Singaraju, John A. Chandy Presented by: Justin Riseborough Albert Tirtariyadi.

Similar presentations

Presentation on theme: "FPGA Based String Matching for Network Processing Applications Janardhan Singaraju, John A. Chandy Presented by: Justin Riseborough Albert Tirtariyadi."— Presentation transcript:

Similar presentations

About project

Feedback