Gregex: GPU based High Speed Regular Expression Matching Engine Date:101/1/11 Publisher:2011 Fifth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing Author:Lei Wang, Shuhui Chen, Yong Tang, Jinshu Su Presenter : Shi-qu Yu
INTRODUCTION Gregex, a Graphics Processing Unit (GPU) based regular expression matching engine for deep packet inspection (DPI). Gregex leverages the computational power and high memory bandwidth of GPUs by storing data in proper GPU memory space and executing massive GPU thread concurrently to process lots of packets in parallel
THE PROPOSED GREGEX- Framework
Matching result buffer is a single dimension array allocated in the global device memory; the size of the array is equal to the number of packets that are processed by GPU at a time
THE PROPOSED GREGEX- Framework
THE PROPOSED GREGEX- Workflow pre-processing phase signature matching phase post-processing phase
Pre-processing phase Compiling regular expressions to DFA Once the DFA has been constructed, the state transition table is copied to texture memory of GPU by two steps: 1. Copy state transition table from CPU memory to GPU global memory; 2. Bind the state transition table in global memory to texture cache. Transferring packets to GPU Gregex chooses to copy packets to device memory in batches.
Signature matching phase
Post-processing phase When all GPU threads finish matching, the matching result array is copied to the CPU memory. The kth cell of the matching result array contains the ID of the regular expression that matches the kth packet;if no match occurs, it is set to zero.
Optimizations 1) Asynchronous packets Transfer with Page-locked memory(ATP): Asynchronous copy:using cudaMemcpyAsync function is nonblocking transfers, control is returned immediately to the host. thread.Zero copy: Zero copy requires mapped page-locked memory and enables GPU threads to directly access host memory.
Optimizations 2)Coalesced global memory access in regular expression matching Coalesced global memory Access by Buffering packets to shared Memory (CAB) In this work, coalesced global memory access is obtained by having each half warp reading contiguous locations of global memory to shared memory. We use s packets which is a 32×32 shared memory array of 32-bit words, to ”buffer” packet from global memory for every thread.
EVALUATION RESULTS PC with a 2.66 GHz Intel Core 2 Duo processor, 4 GB memory and a NVIDIA GeForce GTX 260 GPU card. GTX260 GPU contains 216 SPs organized in 27 SMs, running at 1.35 GHz with 896 MB of global memory. Gregex uses signatures in the rule set released with Snort 2.7. The rule set consists of 56 different signature sets.
Packets Transfer Performance
Regular Expression Matching Performance
Overall throughput of Gregex