Memory-Efficient Regular Expression Search Using State Merging Author: Michela Becchi, Srihari Cadambi Publisher: INFOCOM th IEEE International Conference on Computer Communications. IEEE Presenter: Ching-Hsuan Shih Date: 2014/04/09 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
Outline Introduction Related Work State Merging: A Motivational Example State Merging in DFAs Bitmap-based Data Structures for DFAs Experimental Results 2 National Cheng Kung University CSIE Computer & Internet Architecture Lab
Introduction (1/2) Network Intrusion Detection System (NIDS) Is a device or software to monitor the network whether there are malicious activities. Most IDS is to observe the network packet,system log or network flow. Regular Expression Current rule-sets like Snort, Bro, and many others are replacing strings with the more powerful and expressive regular expressions. National Cheng Kung University CSIE Computer & Internet Architecture Lab 3
Introduction (2/2) The classical method to perform regular expression search is to use a deterministic finite automaton (DFA). The main problem with DFAs is prohibitive memory usage: The number of states in a DFA scale poorly with the size and number of wildcards in the regular expressions they represent. We propose a novel technique that allows non-equivalent states in a DFA to be merged using a scheme where the transitions in the DFA are labeled. National Cheng Kung University CSIE Computer & Internet Architecture Lab 4
Related Work National Cheng Kung University CSIE Computer & Internet Architecture Lab 5 Delayed DFA (D2FA) [6]: It identifies two (or more) states that transition to the same set of destinations on the same input characters. D2FA achieves memory compaction by removing duplicated transitions, but this happens at the expense of latency. States with a default transition require more than one transition per input character. In [14]: The authors propose increasing the speed of regular expression search by expanding the alphabet. They process two characters (bytes) for every state transition in the DFA. This produces an exponential increase in memory usage.
State Merging: A Motivational Example(1/4) National Cheng Kung University CSIE Computer & Internet Architecture Lab 6
State Merging: A Motivational Example (2/4) National Cheng Kung University CSIE Computer & Internet Architecture Lab 7 The merged state is represented as 3_4 The transition [g-i]/0, j/1 indicates that the same next state, in this case state 5, is reached from state 3_4 upon receiving input characters g, h, i with label 0 or input character j with label 1.
State Merging: A Motivational Example (3/4) National Cheng Kung University CSIE Computer & Internet Architecture Lab 8
State Merging: A Motivational Example (4/4) National Cheng Kung University CSIE Computer & Internet Architecture Lab 9 The merged state is represented as 1_2 The transition a.0/0,1 from state 3_4 to state 1_2 means: The transition carries with it a label 0 that tells its destination state, 1_2 that the transition is meant for underlying original state 1. The transition is taken when its source state 3_4 receives labels 0 or 1.
State Merging in DFAs (1/3) National Cheng Kung University CSIE Computer & Internet Architecture Lab 10 A. Labels For every transition connecting two merged states, we define source labels and destination labels, ex. c.l d /l 0, l 1 … B. Legality of State Merging
State Merging in DFAs (2/3) National Cheng Kung University CSIE Computer & Internet Architecture Lab 11 C. Merging and Labeling Algorithm
State Merging in DFAs (3/3) National Cheng Kung University CSIE Computer & Internet Architecture Lab 12
Bitmap-based Data Structure for DFAs (1/3) Basic: National Cheng Kung University CSIE Computer & Internet Architecture Lab 13
Bitmap-based Data Structure for DFAs (2/3) Bitmap-based: National Cheng Kung University CSIE Computer & Internet Architecture Lab 14
Bitmap-based Data Structure for DFAs (3/3) Bitmap-based merged data structure: National Cheng Kung University CSIE Computer & Internet Architecture Lab 15
Experimental Results (1/2) Note that the Snort rule-sets have lower percentages of distinct next state transitions than the Bro rule-sets. This is due to the large number of character ranges (both in the form [c 1 -c 2 ] and \d, \D, \w, \W, \s, \S) and to the fact that Snort regular expressions are not case sensitive. National Cheng Kung University CSIE Computer & Internet Architecture Lab 16
Experimental Results (2/2) The width of the transition table is set to 32 bits. National Cheng Kung University CSIE Computer & Internet Architecture Lab 17