FPGA Based String Matching For Network Processing Applications

FPGA Based String Matching For Network Processing Applications
Janardhan Singaraju, John A. Chandy - Presented by Matthew Reffle Matthew Reffle

Content Introduction/Background Information Current Implementations
Hardware Designs Results and Applications Concluding Remarks Comments and Criticism Question Matthew Reffle

Introduction/Background
What is string matching? Current implementations: Software Flexible/reliable but slower Hardware Fast but area and resource consuming Current uses: DNS lookup IP address searches Network security -What is string matching? -Trying to find the presence of a keyword or phrase in a series of bits or strings. -Many software algorithms for string matching LIST -Preform parallelism. - String matching and partial/related string matching in files, databases, network processing (intrusion detection and directory loop applications but more of exact). Matthew Reffle

Different Implementations
Rabin-Karp Knuth-Morris Boyer-Moore Good/reliable output Very flexible Works on GPP Slower for networks Not many implementations(2006) Shift and add Motoura’s cellular automata Not very flexible More resources Very fast Software Algorithms - there are many algorithms that use string matching, some include Rabin-Karp, Knuth-Morris-Pratt and the Boyer Moore - The Rabin-Karp algorithm uses hashing functions to find the string match. Tries to find a match based on the hashes of the input string and the sub-strings and avoiding character by character matching. - Knuth-Morris-Pratt and Boyer-Moore method are both character by character matching algorithms however Boyer-Moore searches right to left, adjusting by a predetermined shifting distance. - Because of the need of a faster and faster connection software algorithms end up slowing down the connection and a new way to search for keywords and phrases need to be created. That's where the hardware algorithms come in. These can be completed in much less time than the software implementation however it is not necessarily as flexible as software is. - Hardware String Matching - Usually preferred over software for data-intensive applications - These algorithms (such as the shift and add design) take the graphs and translate them into huge chunks of FPGA circuitry. This allows for quick transition and fast searching times. However these cover a large area in an FPGA and therefore not very area optimizing. Whenever there is a change in the expression the FPGA has to be reprogrammed. - One of the most popular Hardware algorithms Motomura's cellular automata structure. One of the extensions to this design is the Content Addressable Memories (CAMs). With a CAM the lookup tables are a fixed length, and so is the keyword you are using. This is used a lot in IP address lookup tables in routers and dictionary based searching, etc. - The key to this specific paper by Singaraju and Chandy is to created an algorithm that is more flexible than the CAM design and will allow for string matching as well as key to value mapping. Software Hardware Matthew Reffle

Hardware Implementations
String Lookup Cache & Network Intrusion Detection Like I said on the previous slide software is much more flexible than hardware so we need to find out how to bridge the gap, with regards to flexibility, between hardware and software. There are two major designs outlined in this paper. The first that I will go over is the String Lookup Cache which will be used for DNS and IP address matching. And the second will be Network Intrusion Detection which deals with matching and finding key packets and information relating to all impending threats to the network of the specific organization. Matthew Reffle

String Lookup Cache Mainly used in DNS lookup and IP Address mapping
Uses character arrays to match strings, character by character Implementable on FPGAS The first architecture I will cover will be the string lookup cache. This will be mainly used for DNS queries where all of the information won't necessarily be the same length. Matthew Reffle

Architecture General lookup cache Network Processor design
32-bit bus for IP Address return - Uses a general lookup cache with a network processor design - It uses a 32-bit bus so the IP address can easily be returned. If, however it is larger it will then send a pointer to the memory location where the larger address can be found. - Packets flow into the network processor which will then query the lookup cache to find a domain name - If the hostname is present in the cache then the IP address will be returned Singaraju, J., Chandy, J. A. “FPGA Based String Matching for Network Processing Applications”, 2008 Singaraju, J., Chandy, J. A. “FPGA Based String Matching for Network Processing Applications” 2008 Matthew Reffle

Character Match Array Made up of a 8xn array of CAM Cells
Each CAM contains a bit comparator and a storage cell Each character in ASCII representation Parallel implementation between characters - The Character Match Array is made up of a 8 by n array of CAM cells - Each CAM Cell contains a storage cell and a bit comparator. - The storage cell is a traditional SRAM cell used to define the end of a word since we are dealing with variable lengths - Using the CAM array you can find character matches - Each character stored has its own specific ASCII code, this allows us to quickly pick apart a packet and compare them to each individual ASCII representation in the case of a text search. For more general processes such as a DNS lookup cache we can use an 8 bit character array of n characters - Each bit in the character is done in parallel and can be completed in one clock cycle - In FPGA implementation each Cell would consist of a register for the Storage Cell and an XOR get for the bit comparator - When the character is found and compared to the requested character the result shows up as a one. Singaraju, J., Chandy, J. A. “FPGA Based String Matching for Network Processing Applications”, 2008 Matthew Reffle

Processor Element Array
Uses multi-byte boundary with a possibility to use more PEs for flexibility After character match is found PE Array will show which word is a match Uses flags to represent a match When flags match a word then word is found Time in an m word search is m time. - Once a match in the characters are found we now look to the PE Array to tell us which set of characters has been matched - When the CAM array has a match it then sets off a flag in the PE array to show that there is indeed a match in the corresponding PE(i) location. - After the processes is complete the PE(i) location is then looked up in a map, or a set of strings associated with that specific output (see example). - From this it is easy to show that the amount of time taken for m byte word to be completed is m clock cycles, if we remember that each byte word can be completed in one clock cycle. - Each keyword is presented in a multi-byte boundary of 32 bits or 4 bytes. The word can go over this boundary however it will then take more than one cycle to compute. The 4 bytes are fed into the CAM array to come out with a result as previously stated and output a set of 1s or 0s to the PE array which decides which word is a match. Singaraju, J., Chandy, J. A. “FPGA Based String Matching for Network Processing Applications”, 2008 Matthew Reffle

Output Basic overview of high level system
Singaraju, J., Chandy, J. A. “FPGA Based String Matching for Network Processing Applications”, 2008 Basic overview of high level system Each character match sets a 1 and each word match will set a 1, otherwise 0. - This PE(i) outputs a specific index location in the map that allows the matching IP Address to be found in the table. - This table should be as many entries long as there are PEs, for instance if there are 4 PEs from this example then there are four locations in the lookup table. - This should then output the IP Address or a pointer to a memory location in order to find a larger address. - In order to replace or overwrite data entries in the cache you must first figure out if the packet that needs to be replaced is greater than 32 bits. If it is then you simply delete it and shift all of the other packets to the right by 8 bytes as in the previous example. Matthew Reffle

FPGA Implementation and Applications
Singaraju, J., Chandy, J. A. “FPGA Based String Matching for Network Processing Applications”, 2008 Hardware implemented via FPGA dominated software Searches per second increased 300 times Throughput well exceeds todays network standards Common applications: DNS lookup and IP Address mapping Network storage Network Intrusion - This is used mainly in a DNS system where Internet domain names are mapped to IP Address. - This code is written in VHDL using a Xilinx Virtex-II Pro FPGA, using Xilinx ISE design environment for all parts of the design. - Different sized host files are placed in the CAM array with 32 bit boundaries, or 4 characters long using a maximum speed of MHz Differences can be seen in the two output tables listed using a 1GHz PowerPC computer. Because of the large throughput of this function this can easily work with today's 10GB/s or greater network speeds This application can be widely used in DNS lookup, network storage options, and LDAP processing. Another big application of this is the use in Network Intrusion Detection, however in this design we are now looking to modify the CAM array so as to unalign the words since the packets can be in any place in the query. Matthew Reffle

Network Intrusion Detection System
Process of identifying and analysing threats to a network Passive Secondary node analyzes data coming onto network Host Looking at information coming into a specific node (usually a router, gateway or switch) Software had very poor throughput Need to increase throughput with hardware The process of identifying and analyzing packets that may be an impending threat to an organization's network. Can be passive (uses secondary node to analyze data coming onto a network) or host (looking at information coming specifically through this node, gateway, switch or routers). Software has a hard time implementing this Snort design because it cannot even handle 100Mb/s. To achieve higher it must drop packets or drop rules. The goal of the hardware systems is to increase the throughput of these software designs. Matthew Reffle

Architecture Different from Lookup Cache architecture
Needs more precise lookup rules Control unit must have control over individual sections in the PE array - Similar to the Lookup Cache design however as you can see it adds a buffer as well as a match address output logic. - This allows for the character matching to be done all sequentially without stop however the buffer will hold the information until the address logic can handle all of the information at once - Control unit has control over resetting, buffering and loading of information to the address logic unit. Singaraju, J., Chandy, J. A. “FPGA Based String Matching for Network Processing Applications”, 2008 Matthew Reffle

Character Match and Signature Match Array
More of a byte match array Does not use CAM cells Needed for multi-length and dictionary type processing as well as mid-byte checking Used for checking for matches in different processing elements - This is more of a byte comparator instead of a bit comparator like in the CAM cells. - Does not have a storage area, feeds the information straight through to the Signature Match Array. This, however, needs to be reprogrammed every time a new set of keywords needs to be programmed. This reprogramming doesn’t need to be programmed very often since the packets do not change very often Very similar to the PE Array in the Lookup Cache structure. Major difference is the forwarding of information from one PE to another Also needs to map multiple PEs to deal with larger words Singaraju, J., Chandy, J. A. “FPGA Based String Matching for Network Processing Applications”, 2008 Matthew Reffle

Address Output and Control Logic
Stores information from PE Arrays in a buffer in order to figure out the position of the matching word Control logic resets and manages each array and memory buffer - Takes the information from the PE Array and matches them using a binary tree and stored in a buffer - Using an MAO (matched address output) logic block we can set the output in a binary tree representation - You can see from this figure that the two blocks of the MAO are pipelined in order to increase the clock frequency - They found that more pipelining did not improve the output of the program Separate from the other entities and do not rely on the other parts to continue, can run while doing comparisons of a new Package The Controller - Will reset the Signature Array when new packets of data are ready to be matched - You do not need to worry about overwriting information since the information is stored in a buffer and will not be affected by the resetting process - After all of the data is put through the byte comparators the controller will enable the address logic. Singaraju, J., Chandy, J. A. “FPGA Based String Matching for Network Processing Applications”, 2008 Matthew Reffle

Output Multi-byte matching Sends “fl44” over 2 clock cycles
Finds Match “l44” target string - Since Address Logic can be done in parallel to the signature match this greatly reduces the time of the output When we send a packet of "fl44" we can see how this will be done through this figure When a match is computed it will send an alert or an interrupt to the cpu to deal with the potential threat to the network Singaraju, J., Chandy, J. A. “FPGA Based String Matching for Network Processing Applications”, 2008 Matthew Reffle

FPGA Implementation Throughput increases as parallelism increases
Size increases as parallelism increases Not able to fully implement all the rules, about half - As parallelism increases so does throughput, however the size or area/resources used also increase - Virtex-II Pro was not able to fully implement all rules due to resource utilization Singaraju, J., Chandy, J. A. “FPGA Based String Matching for Network Processing Applications”, 2008 Matthew Reffle

Concluding Design Remarks
String Lookup Cache improves dramatically over software Network Intrusion Detection has been completed before Comparable to other works, better in logic cells No other outstanding improvements - String Lookup Cache has improved string matching by a huge factor, as we saw before it has improves throughputs by 300 times. This is a huge improvement and has been proven to be a very good replacement for outdated software solutions Network Intrusion Detection was a little more complex than the Lookup Cache however it has a lost smaller throughput compared to the Lookup Cache - These designs have already been done by many other companies - Not many big improvements over other companies however their design Increases the Number of characters while decreasing the number of logic cells per character. From this you can see that this design might use a smaller area compared to other design Singaraju, J., Chandy, J. A. “FPGA Based String Matching for Network Processing Applications”, 2008 Matthew Reffle

Comments and Criticism
Well written Well documented Very detailed Good references Great improvements for Lookup Cache No strong improvements in Network Intrusion Virtex-II Pro was not able to fully implement Network Intrusion design May be very useful to implement this using a higher end model today Matthew Reffle

Questions? Matthew Reffle

FPGA Based String Matching For Network Processing Applications

Similar presentations

Presentation on theme: "FPGA Based String Matching For Network Processing Applications"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

FPGA Based String Matching For Network Processing Applications

Similar presentations

Presentation on theme: "FPGA Based String Matching For Network Processing Applications"— Presentation transcript:

Similar presentations

About project

Feedback