Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan Turner Presented by: Sailesh Kumar
2 - Sailesh Kumar - 12/6/2015 Overview n Why regular expressions acceleration is important? n Introduction to our approach »Delayed Input DFA (D 2 FA) n D 2 FA construction n Simulation results n Memory mapping algorithm n Conclusion
3 - Sailesh Kumar - 12/6/2015 Why Regular Expressions Acceleration? n RegEx are now widely used »Network intrusion detection systems, NIDS »Layer 7 switches, load balancing »Firewalls, filtering, authentication and monitoring »Content-based traffic management and routing n RegEx matching is expensive »Space: Large amount of memory »Bandwidth: Requires 1+ state traversal per byte n RegEx is performance bottleneck »In enterprise switches from Cisco, etc »Cisco security appliances –Use DFA, 1+ GB memory, still sub-gigabit throughput »Need to accelerate RegEx!
4 - Sailesh Kumar - 12/6/2015 Can we do better? n Well studied in compiler literature »What’s different in Networking? »Can we do better? n Construction time versus execution time (grep) »Traditionally, (construction + execution) time is the metric »In networking context, execution time is critical »Also, there may be thousands of patterns n DFAs are fast »But can have exponentially large number of states »Algorithms exist to minimize number of states »Still 1) low performance and 2) gigabytes of memory n How to achieve high performance? »Use ASIC/FPGA –On-chip memories provides ample bandwidth –Volume and need for speed justifies custom solution »Limited memory, need space efficient representation!
5 - Sailesh Kumar - 12/6/2015 Introduction to Our Approach n How to represent DFAs more compactly? »Can’t reduce number of states »How about reducing number of transitions? –256 transitions per state –50+ distinct transitions per state (real world datasets) –Need at least 50+ words per state Three rules a+, b+c, c*d b 4 5 a d a c a b d a c b c b b a c d d d c 4 transitions per state Look at state pairs: there are many common transitions. How to remove them?
6 - Sailesh Kumar - 12/6/2015 Introduction to Our Approach n How to represent DFAs more compactly? »Can’t reduce number of states »How about reducing number of transitions? –256 transitions per state –50+ distinct transitions per state (real world datasets) –Need at least 50+ words per state Three rules a+, b+c, c*d+ 1 3 a a a b b c b b c d d d c 4 transitions per state Alternative Representation d c a b d c a 1 3 a a a b b c b b c d d d c d c a b d c a Fewer transitions, less memory
7 - Sailesh Kumar - 12/6/2015 D 2 FA Operation 1 3 a a a b b c b b c d d d c d c a b d c a 1 3 a c c b d Input stream: a b d DFA and D 2 FA visits the same accepting state after consuming a character Heavy edges are called default transitions Take default transitions, whenever, a labeled transition is missing DFA D 2 FA
8 - Sailesh Kumar - 12/6/2015 D 2 FA Operation 1 3 a a a b b c b b c d d d c d c a b d c a 1 3 a c c b d Any set of default transitions will suffice if there are no cycles of default transitions Thus, we need to construct trees of default transitions So, how to construct space efficient D 2 FAs? while keeping default paths bounded d c b c b d a 5 5 a c c Above two set of default transitions trees are also correct However, we may traverse 2 default transitions to consume a character Thus, we need to do more work => lower performance
9 - Sailesh Kumar - 12/6/2015 D 2 FA Construction n Present systematic approach to construct D 2 FA n Begin with a state minimized DFA n Construct space reduction graph »Undirected graph, vertices are states of DFA »Edges exist between vertices with common transitions »Weight of an edge = # of common transitions b 4 5 a d a c a b d a c b c b b a c d d d c
10 - Sailesh Kumar - 12/6/2015 D 2 FA Construction n Convert certain edges into default transitions »A default transition reduces w transitions (w = wt. of edge) »If we pick high weight edges => more space reduction »Find maximum weight spanning forest »Tree edges becomes the default transitions n Problem: spanning tree may have very large diameter »Longer default paths => lower performance b 4 5 a d a c a b d a c b c b b a c d d d c # of transitions removed = =11 root
11 - Sailesh Kumar - 12/6/2015 D 2 FA Construction n We need to construct bounded diameter trees »NP-hard »Small diameter bound leads to low trees weight –Less space efficient D 2 FA »Time-space trade-off n We propose heuristic algorithm based upon Kruskal’s algorithm to create compact bounded diameter D 2 FAs b 4 5 a d a c a b d a c b c b b a c d d d c
12 - Sailesh Kumar - 12/6/2015 D 2 FA Construction n Our heuristic incrementally builds spanning tree »Whenever, there is an opportunity, keep diameter small »Based upon Kruskal’s algorithm »Details in the paper
13 - Sailesh Kumar - 12/6/2015 Results n We ran experiments on »Cisco RegEx rules »Linux application protocol classifier rules »Bro rules »Snort rules (subset of rules) Size of DFA versus D 2 FA (No default path length bound applied)
14 - Sailesh Kumar - 12/6/2015 Space-Time Tradeoff Longer default path => more work but less space Space efficient region Default paths have length 4+ Requires 4+ memory accesses per character We propose memory architecture Which enables us to consume one character per clock cycle
15 - Sailesh Kumar - 12/6/2015 Summary of Memory Architecture n We propose an on-chip ASIC architecture »Use multiple embedded memories to store the D 2 FA –Flexibility –Frequent changes to rules n D 2 FA requires multiple memory accesses »How to execute D 2 FA at memory clock rates? n We have proposed deterministic contention free memory mapping algorithm »Uniform access to memories »Enables D 2 FA to consume a character per memory access »Nearly zero memory fragmentation –All memories are uniformly used n Details and results in paper n At 300 MHz we achieve 5 Gbps worst-case throughput
16 - Sailesh Kumar - 12/6/2015 Conclusion n Deep packet inspection has become challenging »RegEx are used to specify rules »Wire speed inspection n We presented an ASIC based architecture to perform RegEx matching at 10’s of Gigabit rates n As suggested in the public review, this paper is not the final answer to RegEx matching »But it is a good start n We are presently developing techniques to perform fast RegEx matching using commodity memories »Collaborators are welcome!!!
17 - Sailesh Kumar - 12/6/2015 Thank you and Questions?
18 - Sailesh Kumar - 12/6/2015 Backup Slides
19 - Sailesh Kumar - 12/6/2015 D 2 FA Construction n Our heuristic incrementally builds spanning tree »Whenever, there is an opportunity, keep diameter small »Details in the paper n Graph with 31 states, max. wt. default transition tree »Our heuristic creates smaller default paths Kruskal’s algorithm, Max. default path = 8 edges Our refined Kruskal’s algorithm, Avg. default path = 5 edges
20 - Sailesh Kumar - 12/6/2015 Multiple Memories n To achieve high performance, use multiple memories and D 2 FA engines n Multiple memories provide high aggregate bandwidth n Multiple engines use bandwidth effectively »However, worst case performance may be low –No better than a single memory »May need complex circuitry to handle contention n We propose deterministic contention free memory mapping and compare it to a random mapping
21 - Sailesh Kumar - 12/6/2015 Memory Mapping n The memory mapping algorithm can be modeled as a graph coloring »Graph is the set of default transition trees »Colors represent the memory modules »Color nodes of the trees such that –Nodes along a default path are colored with different colors –All colors are uniformly used n We propose two methods, naïve and adaptive Naïve coloring Adaptive coloring
22 - Sailesh Kumar - 12/6/2015 Results n Adaptive mapping leads to much more uniform color usage »Memories are uniformly used, little fragmentation »Up to 20% space saving with adaptive coloring n Throughput results (300 MHz dual-port eSRAM)