GPEP : Graphics Processing Enhanced Pattern- Matching for High-Performance Deep Packet Inspection Author: Lucas John Vespa, Ning Weng Publisher: 2011 IEEE International Conferences on Internet of Things,and Cyber, Physical and Social Computing (4 th CPSCom) Presenter: Ye-Zhi Chen Date: 2012/04/25
Introduction GPEP uses an optimized version of our pattern matching algorithm called P 3 FSM, which has low operational complexity, but reduces the memory requirement such that the state tables can fit into the small on chip memories of a GPU
P 3 FSM 1. DFA Optimization (split-DFA): This optimization splits the DFA transitions into primary and secondary blocks at the first level of the DFA. All incoming transitions to the primary block are removed from the DFA. An example split-DFA is shown in Figure 1(b). The two blocks are encoded into two separate memory tables. If a transition is not present in the secondary block table, then the primary block table acts as a default transition lookup for the current input character.
P 3 FSM I Primary secondary
P 3 FSM 2. Deriving State Codes : (1) Group all states in the SDFA that have the same next state into a group (2) Groups with the same character are combined into a cluster. (3) the number of clusters are reduced by merging all the clusters that do not have common states in the secondary block to form one cluster (4)Encoding the groups : a) Character Signature (cs) : it identifies the character required for transitions to a state b) State Signature (ss) : it identifies the next state (5)State code : a state code for each state is obtained by concatenating the group codes for the groups that a state is a member of
P 3 FSM G 1 [S 0 ][H] G 2 [S 0 ][S] G 3 [S 1 ][E] G 4 [S 1 S 5 ][I] G 5 [S 2 S 7 S 9 ][H] G 6 [S 3 S 8 ][R] G 7 [S 4 ][S] G 8 [S 5 ][E] G 9 [S 6 ][S] I
P 3 FSM H S E R I C1C2
P 3 FSM Operating Table : (1) Charater / Cluster Table (cc) : (2)Code Table (code): S index =Ch offset + S sig Falure index
P 3 FSM
Memory Efficient : Equation 1 : STT = Q* 「 log 2 Q ┐ *2 8 Q is the total number of state of the DFA Equation 2 :P 3 FSM = Q*(L+ 「 log 2 P ┐ ) L is the length of state code P is the number of patterns to be detected
P 3 FSM
GPEP ARCHITECTURE
Host : The host creates and optimizes the DFA The host transfers the resulting tables to the memory of the GPU The host also maintains the current packet buffer which is mapped to the global memory of the GPU
GPEP ARCHITECTURE Device : The memory tables necessary for the P 3 FSM kernel operation are stored in the local data store (LDS) of each compute unit, and the private memory of each stream core.
GPEP ARCHITECTURE